47
1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 1 genome structure, sequence, and immunogenic potential 2 3 Authors: Kara A. Moser 1, Elliott F. Drábek 1 , Ankit Dwivedi 1 , Jonathan Crabtree 1 , Emily 4 M. Stucke 2 , Antoine Dara 2 , Zalak Shah 2 , Matthew Adams 2 , Tao Li 3 , Priscila T. 5 Rodrigues 4 , Sergey Koren 5 , Adam M. Phillippy 5 , Amed Ouattara 2 , Kirsten E. Lyke 2 , Lisa 6 Sadzewicz 1 , Luke J. Tallon 1 , Michele D. Spring 6 , Krisada Jongsakul 6 , Chanthap Lon 6 , 7 David L. Saunders , Marcelo U. Ferreira 4 , Myaing M. Nyunt , Miriam K. Laufer 2 , Mark 8 A. Travassos 2 , Robert W. Sauerwein 7 , Shannon Takala-Harrison 2 , Claire M. Fraser 1 , B. 9 Kim Lee Sim 3 , Stephen L. Hoffman 3 , Christopher V. Plowe , Joana C. Silva 1,8 10 11 Corresponding author: Joana C. Silva, [email protected] 12 Affiliations: 13 1 Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, 14 MD, 21201 15 2 Center for Vaccine Development and Global Health, University of Maryland School of 16 Medicine, Baltimore, MD, 21201 17 3 Sanaria, Inc. Rockville, Maryland, 20850 18 4 Department of Parasitology, Institute of Biomedical Sciences, University of São Paulo, 19 São Paulo, Brazil 20 5 Genome Informatics Section, Computational and Statistical Genomics Branch, 21 National Human Genome Research Institute, Bethesda, Maryland, USA 20892 22 6 Armed Forces Research Institute of Medical Sciences, Department of Bacterial and 23 Parasitic Diseases, Bangkok, Thailand 24 7 Department of Medical Microbiology, Radboud University Medical Center, Nijmegen, 25 Netherlands 26 8 Department of Microbiology and Immunology, University of Maryland School of 27 Medicine, Baltimore, MD, 21201 28 29 Current address: Institute for Global Health and Infectious Diseases, University of 30 North Carolina Chapel Hill 31 Current address: Warfighter Expeditionary Medicine and Treatment, US Army Medical 32 Material Development Activity 33 § Current address: Duke Global Health Institute, Duke University, Durham, North 34 Carolina, 27708 35 36 Author emails (in order of author list above): 37 [email protected], [email protected], [email protected], 38 [email protected], [email protected], [email protected], 39 [email protected], [email protected], [email protected], 40 [email protected] , [email protected], [email protected], 41 [email protected], [email protected], 42 [email protected], [email protected], 43 [email protected], [email protected], [email protected], 44 [email protected], [email protected], [email protected], 45 certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not this version posted June 27, 2019. ; https://doi.org/10.1101/684175 doi: bioRxiv preprint

Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

1

Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 1

genome structure, sequence, and immunogenic potential 2

3

Authors: Kara A. Moser1‡, Elliott F. Drábek1, Ankit Dwivedi1, Jonathan Crabtree1, Emily 4

M. Stucke2, Antoine Dara2, Zalak Shah2, Matthew Adams2, Tao Li3, Priscila T. 5

Rodrigues4, Sergey Koren5, Adam M. Phillippy5, Amed Ouattara2, Kirsten E. Lyke2, Lisa 6

Sadzewicz1, Luke J. Tallon1, Michele D. Spring6, Krisada Jongsakul6, Chanthap Lon6, 7

David L. Saunders6¶, Marcelo U. Ferreira4, Myaing M. Nyunt2§, Miriam K. Laufer2, Mark 8

A. Travassos2, Robert W. Sauerwein7, Shannon Takala-Harrison2, Claire M. Fraser1, B. 9

Kim Lee Sim3, Stephen L. Hoffman3, Christopher V. Plowe2§, Joana C. Silva1,8 10

11

Corresponding author: Joana C. Silva, [email protected] 12

Affiliations: 13 1 Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, 14

MD, 21201 15 2 Center for Vaccine Development and Global Health, University of Maryland School of 16

Medicine, Baltimore, MD, 21201 17 3 Sanaria, Inc. Rockville, Maryland, 20850 18 4 Department of Parasitology, Institute of Biomedical Sciences, University of São Paulo, 19

São Paulo, Brazil 20 5 Genome Informatics Section, Computational and Statistical Genomics Branch, 21

National Human Genome Research Institute, Bethesda, Maryland, USA 20892 22 6 Armed Forces Research Institute of Medical Sciences, Department of Bacterial and 23

Parasitic Diseases, Bangkok, Thailand 24 7 Department of Medical Microbiology, Radboud University Medical Center, Nijmegen, 25

Netherlands 26 8 Department of Microbiology and Immunology, University of Maryland School of 27

Medicine, Baltimore, MD, 21201 28

29 ‡Current address: Institute for Global Health and Infectious Diseases, University of 30

North Carolina Chapel Hill 31 ¶Current address: Warfighter Expeditionary Medicine and Treatment, US Army Medical 32

Material Development Activity 33 §Current address: Duke Global Health Institute, Duke University, Durham, North 34

Carolina, 27708 35

36

Author emails (in order of author list above): 37

[email protected], [email protected], [email protected], 38

[email protected], [email protected], [email protected], 39

[email protected], [email protected], [email protected], 40

[email protected] , [email protected], [email protected], 41

[email protected], [email protected], 42

[email protected], [email protected], 43

[email protected], [email protected], [email protected], 44

[email protected], [email protected], [email protected], 45

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 2: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

2

[email protected] , [email protected], 46

[email protected], [email protected], 47

[email protected], [email protected], [email protected], 48

[email protected], [email protected] 49

50

51

52

Abstract 53

Background: Plasmodium falciparum (Pf) whole-organism sporozoite vaccines have 54

provided excellent protection against controlled human malaria infection (CHMI) and 55

naturally transmitted heterogeneous Pf in the field. Initial CHMI studies showed 56

significantly higher durable protection against homologous than heterologous strains, 57

suggesting the presence of strain-specific vaccine-induced protection. However, 58

interpretation of these results and understanding of their relevance to vaccine efficacy 59

(VE) have been hampered by the lack of knowledge on genetic differences between 60

vaccine and CHMI strains, and how these strains are related to parasites in malaria 61

endemic regions. 62

Methods: Whole genome sequencing using long-read (Pacific Biosciences) and short-63

read (Illumina) sequencing platforms was conducted to generate de novo genome 64

assemblies for the vaccine strain, NF54, and for strains used in heterologous CHMI (7G8 65

from Brazil, NF166.C8 from Guinea, and NF135.C10 from Cambodia). The assemblies 66

were used to characterize sequence polymorphisms and structural variants in each strain 67

relative to the reference Pf 3D7 (a clone of NF54) genome. Strains were compared to 68

each other and to clinical isolates from South America, Sub-Saharan Africa, and 69

Southeast Asia. 70

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 3: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

3

Results: While few variants were detected between 3D7 and NF54, we identified tens of 71

thousands of variants between NF54 and the three heterologous strains both genome-72

wide and within regulatory and immunologically important regions, including in pre-73

erythrocytic antigens that may be key for sporozoite vaccine-induced protection. 74

Additionally, these variants directly contribute to diversity in immunologically important 75

regions of the genomes as detected through in silico CD8+ T cell epitope predictions. Of 76

all heterologous strains, NF135.C10 consistently had the highest number of unique 77

predicted epitope sequences when compared to NF54, while NF166.C8 had the lowest. 78

Comparison to global clinical isolates revealed that these four strains are representative 79

of their geographic region of origin despite long-term culture adaptation; of note, 80

NF135.C10 is from an admixed population, and not part of recently-formed drug resistant 81

subpopulations present in the Greater Mekong Sub-region. 82

Conclusions: These results are assisting the interpretation of VE of whole-organism 83

vaccines against homologous and heterologous CHMI, and may be useful in informing 84

the choice of strains for inclusion in region-specific or multi-strain vaccines. 85

86

Keywords: P. falciparum, malaria, genome assembly, PfSPZ Vaccine, whole-sporozoite 87

vaccine 88

89

90

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 4: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

4

Main Text 91

Introduction 92

The flattening levels of mortality and morbidity due to malaria in recent years (1), 93

which follow a decade in which malaria mortality was cut in half (2), highlight the pressing 94

need for new tools to control the disease. A highly efficacious vaccine against 95

Plasmodium falciparum, the deadliest malaria parasite, would be a critical development 96

for control and elimination efforts. Several variations of a highly promising pre-97

erythrocytic, whole organism malaria vaccine based on P. falciparum sporozoites (PfSPZ) 98

are under development, all based on the same P. falciparum strain, NF54 (3), thought to 99

be of West African origin, and which use different mechanisms for attenuation of PfSPZ. 100

Of these vaccine candidates, Sanaria® PfSPZ Vaccine, based on radiation-attenuated 101

sporozoites, has progressed furthest in clinical trial testing (4–10). Other whole-organism 102

vaccine candidates, including chemo-attenuated (Sanaria® PfSPZ-CVac), transgenic 103

and genetically attenuated sporozoites, are in earlier stages of development (11–13). 104

PfSPZ Vaccine showed 100% short-term protection against homologous 105

controlled human malaria infection (CHMI) in an initial phase 1 clinical trial (6), and 106

subsequent trials have confirmed that high levels of protection can be achieved against 107

both short-term (8) and long-term (7) homologous CHMI. However, depending on the 108

immunization regimen, sterile protection can be significantly lower (8-83%) against 109

heterologous CHMI using the 7G8 Brazilian clone (8,9), and against infection in malaria-110

endemic regions with intense seasonal malaria transmission (29% and 52% by 111

proportional and time to event analysis, respectively) (10). Heterologous CHMI in 112

chemoprophylaxis with sporozoite (CPS) studies, in which immunization is by infected 113

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 5: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

5

mosquito bite of individuals undergoing malaria chemoprophylaxis, have been conducted 114

with NF135.C10 from Cambodia (14) and NF166.C8 from Guinea (15), and have had 115

lower efficacy than against homologous CHMI (16,17). One explanation for the lower 116

efficacy seen against heterologous P. falciparum strains is the extensive genetic diversity 117

in this parasite species, which is particularly high in genes encoding antigens (18), which 118

combined with low vaccine efficacy against non-vaccine alleles (19–21) reduces overall 119

protective efficacy and complicates the design of broadly efficacious vaccines (22,23). 120

The lack of a detailed genomic characterization of the P. falciparum strains used in CHMI 121

studies, and the unknown genetic basis of the parasite targets of PfSPZ Vaccine- and 122

PfSPZ CVac-induced protection, have precluded a conclusive statement regarding the 123

cause(s) of variable vaccine efficacy outcomes. 124

The current PfSPZ Vaccine strain, NF54, was isolated from a patient in the Netherlands 125

who had never left the country and is considered a case of “airport malaria”; the exact origin of 126

NF54 is unknown (3), but thought to be from Africa (24,25). NF54 is also the isolate from which 127

the P. falciparum 3D7 reference strain was cloned (26), and hence, despite having been 128

separated in culture for over 30 years, NF54 and 3D7 are assumed to be genetically identical, 129

and 3D7 is often used in homologous CHMI (6,8). Several issues hinder the interpretation of both 130

homologous and heterologous CHMI experiments conducted to date. It remains to be confirmed 131

that 3D7 has remained genetically identical to NF54 genome-wide, or that the two are at least 132

identical immunogenically. Indeed, NF54 and 3D7 have several reported phenotypic differences 133

when grown in culture, including the variable ability to produce gametocytes (27). In addition, 134

7G8, NF166.C8, and NF135.C10 have not been rigorously compared to each other or to NF54 to 135

confirm that they are adequate heterologous strains, even though they do appear to have distinct 136

infectivity phenotypes when used as CHMI strains (15). While the entire sporozoite likely offers 137

multiple immunological targets, no high-confidence correlates of protection currently exist. In part 138

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 6: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

6

because of the difficulty of studying hepatic parasite forms and their gene expression profiles in 139

humans, it remains unclear which parasite proteins are recognized by the human immune system 140

during that stage, and elicit protection, upon immunization with PfSPZ vaccines. Both humoral 141

and cell-mediated responses have been associated with protection against homologous CHMI 142

(6,7), although studies in rodents and non-human primates point to a requirement for cell-143

mediated immunity (specifically through tissue-resident CD8+ T cells) in long-term protection 144

(5,9,28,29). In silico identification of CD8+ epitopes in all strains could highlight critical differences 145

of immunological significance between strains. Finally, heterologous CHMI results cannot be a 146

reliable indicator of efficacy against infection in field settings unless the CHMI strains used are 147

characteristic of the geographic region from which they originate. These issues could impact the 148

use of homologous and heterologous CHMI, and the choice of strains for these studies, to predict 149

the efficacy of PfSPZ-based vaccines in the field (30). 150

These knowledge gaps can be addressed through a rigorous description and 151

comparison of the genome sequence of these strains. High-quality de novo assemblies 152

allow characterization of genome composition and structure, as well as the identification 153

of genetic differences between strains. However, the high AT content and repetitive 154

nature of the P. falciparum genome greatly complicates genome assembly methods (31). 155

Recently, long-read sequencing technologies have been used to overcome some of these 156

assembly challenges, as was shown with assemblies for 3D7, 7G8, and several other 157

culture-adapted P. falciparum strains generated using Pacific Biosciences (PacBio) 158

technology (32–34). However, NF166.C8 and NF135.C10 still lack whole-genome 159

assemblies; in addition, while an assembly for 7G8 is available (33), it is important to 160

characterize the specific 7G8 clone used in heterozygous CHMI, from Sanaria’s working 161

bank, as strains can undergo genetic changes over time in culture (35). Here, reference 162

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 7: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

7

assemblies for NF54, 7G8, NF166.C8, and NF135.C10 (hereafter referred to as PfSPZ 163

strains) were generated using approaches to take advantage of the resolution power of 164

long-read sequencing data and the low error rate of short-read sequencing platforms. 165

These de novo assemblies allowed for the thorough genetic and genomic characterization 166

of the PfSPZ strains and will aid in the interpretation of results from CHMI studies. 167

Materials and Methods 168

Study design and samples 169

This study characterized and compared the genomes of four P. falciparum strains 170

used in whole organism malaria vaccines and controlled human malaria infections using 171

a combination of long- and short-read whole genome sequencing platforms (see below). 172

In addition, these strains were compared to P. falciparum clinical isolates collected from 173

patients in malaria-endemic regions globally, using short-read whole genome sequencing 174

data. Genetic material for the four PfSPZ strains were provided by Sanaria, Inc. Clinical 175

P. falciparum isolates from Brazil, Mali, Malawi, Myanmar, Thailand, Laos, and Cambodia 176

were collected between 2009 and 2016 from cross-sectional surveys of malaria burden, 177

longitudinal studies of malaria incidence, and drug efficacy studies done in collaboration 178

with the Division of Malaria Research at the University of Maryland Baltimore, or were 179

otherwise provided by collaborators (Supplemental Dataset 1). All samples met the 180

inclusion criteria of the initial study protocol with prior approval from the local ethical 181

review board. Parasite genomic sequencing and analyses were undertaken after prior 182

approval of the University of Maryland School of Medicine Institutional Review Board. 183

These isolates were obtained by venous blood draws; almost all samples were processed 184

using leukocyte depletion methods to improve the parasite-to-human ratio before 185

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 8: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

8

sequencing. The exceptions were samples from Brazil and Malawi, which were not 186

leukocyte depleted upon collection. These samples underwent a selective whole genome 187

amplification step before sequencing, modified from (36) (the main modification being a 188

DNA dilution and filtration step using vacuum filtration prior to selective whole genome 189

amplification). Cambodia isolates had been sequenced for a previous study (37). In 190

addition to these samples, samples for which whole genome short-read sequencing has 191

previously been generated were obtained from NCBI’s Short Read Archive (SRA) to 192

supplement malaria-endemic regions not represented in our data set (Peru, Columbia, 193

French Guiana, Guinea, Papua New Guinea) and regions where PfSPZ trials are ongoing 194

(Burkina Faso, Kenya, and Tanzania) (Supplemental Dataset 1). 195

Whole genome sequencing 196

Genetic material for whole genome sequencing of the PfSPZ strains was 197

generated from a cryovial of each strain’s cell bank with the following identifiers: NF54 198

Working Cell Bank (WCB): SAN02-073009; 7G8 WCB: SAN02-199

021214; NF135.C10 WCB: SAN07-010410; NF166.C8 Mother Cell Bank: SAN30-200

020613. Each cryovial was thawed and maintained in human O+ red blood cells 201

(RBCs) at 2% hematocrit (Hct) in complete growth medium (RPMI 1649 with L-glutamine 202

and 25 mM HEPES supplemented with 10% human O+ serum and hypoxanthine) in a 6-203

well plate in 5% O2, 5% CO2 and 90% N2 at 204

37°C. The cultures were then further expanded by adding fresh RBCs every 3-4 205

days and increased culture hematocrit (Hct) to 5% Hct using a standard method (38). 206

The complete growth medium was replaced daily. When the PfSPZ strain culture volume 207

reached 300-400 mL and a parasitemia of more than 1.5%, the culture suspensions 208

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 9: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

9

were collected and the parasitized RBCs were pelleted down by centrifugation at 1800 209

rpm for 5 min. Aliquots of 0.5 mL per cryovial of the parasitized RBCs were stored at -210

80°C prior to extraction of genomic DNA. Genomic DNA was extracted using the Qiagan 211

Blood DNA Midi Kit (Valencia, CA, USA). Pacific Biosciences (PacBio) sequencing was 212

done for each PfSPZ strain. Total DNA was prepared for PacBio sequencing using the 213

DNA Template Prep Kit 2.0 (Pacific Biosciences, Menlo Park, CA). DNA was fragmented 214

with the Covaris E210, and the fragments were size selected to include those >15 Kbp in 215

length. Libraries were prepared per the manufacturer’s protocol. Four SMRT cells were 216

sequenced per library, using P6C4 chemistry and a 120-minute movie on the PacBio RS 217

II (Pacific Biosystems, Menlo Park, CA). 218

Short-read sequencing was done for each PfSPZ strain and for our collection of 219

clinical isolates using the Illumina HiSeq 2500 or 4000 platforms. Prepared genomic DNA, 220

extracted from cultured parasites, leukocyte-depleted samples, or from samples that 221

underwent sWGA (see above), was used to construct DNA libraries for sequencing on 222

the Illumina platform using the KAPA Library Preparation Kit (Kapa Biosystems, Woburn, 223

MA). DNA was fragmented with the Covaris E210 or E220 to ~200 bp. Libraries were 224

prepared using a modified version of manufacturer’s protocol. The DNA was purified 225

between enzymatic reactions and the size selection of the library was performed with 226

AMPure XT beads (Beckman Coulter Genomics, Danvers, MA). When necessary, a PCR 227

amplification step was performed with primers containing an index sequence of six 228

nucleotides in length. Libraries were assessed for concentration and fragment size using 229

the DNA High Sensitivity Assay on the LabChip GX (Perkin Elmer, Waltham, MA). Library 230

concentrations were also assessed by qPCR using the KAPA Library Quantification Kit 231

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 10: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

10

(Complete, Universal) (Kapa Biosystems, Woburn, MA). The libraries were pooled and 232

sequenced on a 100 bp paired-end Illumina HiSeq 2500 or 4000 run (Illumina, San Diego, 233

CA). 234

Assembly generation and characterization of PfSPZ strains 235

Canu (v1.3) (39) was used to correct and assemble the PacBio reads 236

(corMaxEvidenceErate=0.15 for AT-rich genomes, and using default parameters 237

otherwise). To optimize downstream assembly correction processes and parameters, the 238

percentage of total differences (both in bp and by proportion of the 3D7 genome not 239

captured by the NF54 assembly) between the NF54 assembly and the 3D7 reference 240

(PlasmoDBv24) was calculated after each round of correction. Quiver (smrtanalysis v2.3) 241

(40) was run iteratively with default parameters to reach a (stable) maximum reduction in 242

percent differences between the two genomes and the assemblies were further corrected 243

with Illumina data using Pilon (v1.13) (41) with the following parameters: --fixbases, --244

mindepth 5, --K 85, --minmq 0, and --minqual 35. 245

Assemblies were compared to the 3D7 reference (PlasmoDBv24) using 246

MUMmer’s nucmer (42), and the show-snps function was used to generate a list of SNPs 247

and small (< 50 bp) indels between assemblies. Coding and non-coding variants were 248

classified by comparing the show-snps output with the 3D7 gff3 file using custom scripts. 249

Structural variants, defined as indels, deletions, and tandem or repeat expansion and 250

contractions each greater than 50 bp in length were identified using the nucmer-based 251

Assemblytics tool (43) (unique anchor length: 1 Kbp). Translocations were identified by 252

eye through inspection of mummerplots. Translocations and other large structural 253

variants were confirmed using read-mapping data of the PacBio reads against either the 254

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 11: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

11

3D7 reference genome or the respective PfSPZ strain assembly using the NGLMR aligner 255

(44) and visualized using Ribbon (45). Reconstructed exon 1 sequences for var genes, 256

encoding P. falciparum erythrocyte membrane protein 1 (PfEMP1) antigens, for each 257

PfSPZ strain were recovered using the ETHA package (46). As a check for var exon 1 258

sequences that were missed during the generation of the strain’s assembly, a targeted 259

read capture and assembly approach was done using a strain’s Illumina data, wherein 260

var-like reads for each PfSPZ strains were identified by mapping reads against a 261

database of known var exon 1 sequences (47) using bowtie2 (48). Reads that mapped to 262

a known exon 1 sequence were then assembled with Spades (v3.9.0) (49), and the 263

assembled products were blasted against the PacBio reads to identify if they were truly 264

missed exon 1 sequences by the de novo assembly process, or if instead they were 265

chimeras reconstructed by the targeted assembly process. To identify homologs of the 266

61 var exon 1 sequences found in the 3D7 genome, each PfSPZ strain’s set of exon 1 267

amino acid sequences were compared to the 3D7 set using blastp. 268

In silico MHC I epitope predictions 269

Given the reported importance of CD8+ T cell responses towards immunity to 270

whole sporozoites, MHC class 1 epitopes of length 9 amino acids (9mers) were predicted 271

with NetMHCpan (v3.0) (50) for each PfSPZ strain, using inferred amino acid sequences 272

in each genome. HLA types common to African countries where PfSPZ or PfSPZ-CVac 273

trials are ongoing were used for epitope predictions based on frequencies in the Allele 274

Frequency Net Database (51) or from the literature (52,53) (Table S1). Shared epitopes 275

between NF54 and the three heterologous PfSPZ strains were calculated by first 276

identifying epitopes in each gene, and then removing duplicate epitope sequence entries 277

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 12: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

12

(caused by recognition by multiple HLA types). Identical epitope sequences that were 278

identified in two or more genes were treated as distinct epitope entries, and all unique 279

epitope-gene combinations were included when calculating the number of shared 280

epitopes between strains. 281

Epitopes were predicted in several gene sets. The first was all protein-coding 282

genes in which no premature stop codons were detected across all four PfSPZ strains 283

(n=4,814). The second was a subset of the above that are expressed as proteins during 284

the sporozoite stage of the parasite in the mosquito salivary glands, as supported by 285

proteomic data (n=1,974) (54,55). Lastly, a list of pre-erythrocytic genes of interest was 286

generated, either from a review of the literature, or genes whose products were 287

recognized by sera from protected vaccinees of whole organism malaria vaccine trials 288

(both PfSPZ and PfSPZ-CVac) (n=42) (11,56). These gene sequences were inspected 289

manually in each assembly, and corrected for any remaining PacBio sequencing error, 290

particularly one bp indels in homopolymer runs. Even though the 42 genes listed above 291

were detected through antibody responses, many have also been shown to have T cell 292

epitopes, such as CSP and LSA-1. 293

Read mapping and SNP calling 294

For the full collection of clinical isolates that had whole genome short-read 295

sequencing data (generated either at IGS or downloaded from SRA), reads were aligned 296

to the 3D7 reference genome (PlasmoDBv24) using bowtie2 (v2.2.4) (48). Samples with 297

less than 10 million reads mapping to the reference were excluded, as samples with less 298

than this amount had reduced coverage across the genome. Bam files were processed 299

according to GATK’s Best Practices documentation (57–59). Joint SNP calling was done 300

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 13: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

13

using Haplotype Caller (v4.0). Because clinical samples may be polyclonal (that is, more 301

than one parasite strain may be present), diploid calls were initially allowed, followed by 302

calling the major allele at positions with heterozygous calls. If the major allele was 303

supported by >70% of reads at a heterozygous position, the major allele was assigned 304

as the allele at that position (otherwise, the genotype was coded as missing). Additional 305

hard filtering was done to remove potential false positives based on the following filter: 306

DP < 12 || QUAL < 50 || FS > 14.5 || MQ < 20, and variants were further filtered to remove 307

those for which the non-reference allele was not present in at least three samples 308

(frequency less than ~0.5%), and those with more than 10% missing genotype values 309

across all samples. 310

Principal Coordinates Analyses and Admixture Analyses 311

A matrix of pairwise genetic distances was constructed from biallelic non-312

synonymous SNPs identified from the above pipeline (n=31,761) across all samples 313

(n=654) using a custom Python script, and principal coordinates analyses (PCoAs) were 314

done to explore population structure using cmdscale in R. Additional population structure 315

analyses were done using Admixture (v1.3) (60) on two separate data sets: South 316

America and Africa clinical isolates plus NF54, NF166.C8, and 7G8 (n=461), and 317

Southeast Asia and Oceania plus NF135.C10 (n=193). The data sets were additionally 318

pruned for sites in linkage disequilibrium (window size of 20 Kbp, window step of 2 Kbp, 319

R2 0.1). The final South America/Africa and Southeast Asia/Oceania data set used for 320

the admixture analysis consisted of 16,802 and 5,856 SNPs, respectively. The number of 321

populations, K, was tested for values between K=1 to K=15 and run with 10 replicates for 322

each K. For each population, the cross-validation (CV) error from the replicate with the 323

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 14: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

14

highest log-likelihood value was plotted, and the K with the lowest CV value was chosen 324

as the final K. 325

To compare subpopulations identified in our Southeast Asia/Oceania admixture 326

analysis with previously described ancestral, resistant, and admixed subpopulations from 327

Cambodia (61), the above non-synonymous SNP set was used before pruning for LD 328

(n=11,943), and was compared to a non-synonymous SNP dataset (n=21,257) from 167 329

samples used by Dwivedi et al. (62) to describe eight Cambodian subpopulations, in an 330

analysis that included a subset of samples used by Miotto et al. (61) (who first 331

characterized the population structure in Cambodia). There were 5,881 shared non-332

synonymous SNPs between the two datasets, 1,649 of which were observed in 333

NF135.C10. A pairwise genetic distance matrix (estimated as the proportion of base-pair 334

differences between pairs of samples, not including missing genotypes) was generated 335

from the 5,881 shared SNP set, and a dendrogram was built using Ward minimum 336

variance methods in R (Ward.D2 option of the hclust function). 337

Results 338

Generation of assemblies and validation of the assembly protocol 339

To characterize genome-wide structural and genetic diversity of the PfSPZ strains, 340

genome assemblies were generated de novo using whole genome long-read (PacBio) 341

and short-read (Illumina) sequence data (Methods; Table S2; Table S3). Taking 342

advantage of the parent isolate-clone relationship between NF54 and 3D7, we used NF54 343

as a test case to derive the assembly protocol, by adopting, at each step, approaches 344

that minimized the difference to 3D7. The raw assembly of the NF54 genome, while 345

containing 99.9% of the 3D7 genome in 30 contigs, showed ~78K sequence variants 346

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 15: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

15

relative to 3D7 (Figure S1). Many of these differences were small (1-3) base pair (bp) 347

indels (insertions or deletions), particularly in AT-rich regions and homopolymer runs, 348

strongly indicative of remaining sequencing errors (63). To maximize error removal with 349

existing tools, Quiver was run iteratively on the NF54 assembly; two consecutive Quiver 350

runs consistently and substantially reduced the number of differences between NF54 and 351

3D7 (Figure S1). Illumina reads from NF54 were used to further polish the assembly 352

using Pilon. The resulting NF54 assembly is 23.4 Mb in length (Figure 1) and, when 353

compared to the 3D7 reference, contains 8,396 indels and 1,386 single nucleotide 354

polymorphisms (SNPs), a rate of ~59 SNPs per million base pairs, the vast majority of 355

which occurred in noncoding regions (Table 1). In fact, only 17 non-synonymous 356

mutations were detected in 15 intact protein-coding single-copy genes between the two 357

genomes (Supplemental Dataset 2). We applied the same assembly protocol to the 358

remaining PfSPZ strains. The resulting assemblies for NF166.C8, 7G8, and NF135.C10 359

are structurally very complete, with the 14 nuclear chromosomes represented by 30, 20, 360

and 19 nuclear contigs, respectively, with each chromosome in the 3D7 reference 361

represented by one to three contigs (Figure 1). Several shorter contigs in NF54 (67,501 362

bps total), NF166.C8 (224,502 bps total), and NF135.C10 (80,944 bps total) could not be 363

unambiguously assigned to a homologous segment in the 3D7 reference genome; gene 364

annotation showed that these contigs mostly contain members of multi-gene families, and 365

therefore are likely part of sub-telomeric regions. The cumulative lengths of the four 366

assemblies ranged from 22.8 Mbp to 23.5 Mbp (Table 1), indicating variation in genome 367

size among P. falciparum strains. In particular, the 7G8 assembly was several hundred 368

thousand base-pairs smaller than the other three assemblies. To confirm that this was 369

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 16: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

16

not an assembly error, we compared 7G8 to a previously published 7G8 PacBio-based 370

assembly (PlasmoDBv41) (33). The two assemblies were extremely close in overall 371

genome structure, differing only by 2 Kbp in cumulative length, and also shared a very 372

similar number of SNP and small indel variants relative to 3D7 (Table S4). 373

Structural variations in the genomes of the PfSPZ strains 374

Many structural variants (defined as indels or tandem repeat contractions or 375

expansions, greater than 50 bp) were identified in each assembly by comparison to the 376

3D7 genome, impacting a cumulative length of 199.0 Kbp in NF166.C8 to 340.9 Kbp in 377

NF135.C10 (Table S5). Many smaller variants fell into coding regions (including known 378

pre-erythrocytic antigens), often representing variation in repeat units (Supplemental 379

Dataset 3). Several larger structural variants (>10 Kbp) exist in 7G8, NF166.C8, and 380

NF135.C10 relative to 3D7. Many of these regions contain members of multi-gene 381

families, and as expected the number of these vary between each assembly (Figure S2). 382

While all 61 3D7 var exon 1 sequences were present as almost identical sequences in 383

the NF54 assembly (with the exception of small indels in several exon 1 sequences), the 384

number of 3D7 exon 1 var sequences with identifiable orthologs in the three heterologous 385

PfSPZ strains was much lower. Only a few of the identified orthologs had amino acid 386

sequence identity higher than 75% (Figure S2), reflecting the exceptional sequence 387

diversity that occurs in this gene family in and among P. falciparum strains. A recently 388

characterized PfEMP1 protein expressed on the surface of NF54 sporozoites 389

(NF54varsporo) was shown to be involved in hepatocyte invasion (Pf3D7_0809100), and 390

antibodies to this PfEMP1 blocked hepatocyte invasion (64). No ortholog to NF54varsporo 391

was identified in the var repertoire of 7G8, NF166.C8, or NF135.C10 (Figure S2). It is 392

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 17: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

17

possible that a different, strain-specific, var gene fulfills a similar role in each of the 393

heterologous PfSPZ strains. 394

Several other large structural variants impact regions housing non-multi-gene 395

family members, although none known to be involved in pre-erythrocytic immunity. 396

Examples include a 31 Kbp-long tandem expansion of a region of chromosome 12 in the 397

7G8 assembly (also present in the previously published assembly for 7G8 (33)) and a 398

22.7 Kbp-long repeat expansion of a region of chromosome 5 in NF135.C10, both of 399

which are supported by ~200 PacBio reads. The former is a segmental duplication 400

containing a vacuolar iron transporter (PF3D7_1223700), a putative citrate/oxoglutarate 401

carrier protein (PF3D7_1223800), a putative 50S ribosomal protein L24 402

(PF3D7_1223900), GTP cyclohydrolase I (PF3D7_1224000), and three conserved 403

Plasmodium proteins of unknown function (PF3D7_1223500, PF3D7_1223600, 404

PF3D7_1224100). The expanded region in NF135.C10 represents a tandem expansion 405

of a segment housing the gene encoding the Pf multidrug resistance protein, PfMDR1 406

(PF3D7_0523000), resulting in a total of four copies of this gene in NF135.C10. Other 407

genes in this tandem expansion include those encoding an iron-sulfur assembly protein 408

(PF3D7_0522700), a putative pre-mRNA-splicing factor DUB31 (PF3D7_0522800), a 409

putative zinc finger protein (PF3D7_0522900) and a putative mitochondrial-processing 410

peptidase subunit alpha protein (PF3D7_0523100). In addition, the NF135.C10 assembly 411

contained a large translocation involving chromosomes 7 (3D7 coordinates ~520,000 to 412

~960,000) and 8 (start to coordinate ~440,000) (Figure S3); the boundaries of the 413

translocated regions contain members of multi-gene families. Independent assemblies 414

and mapping patterns observed by aligning the PacBio reads against the NF135.C10 and 415

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 18: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

18

3D7 assemblies support the translocation (Figure S4). Furthermore, the regions of 416

chromosomes 7 and 8 where the translocation occurs are documented recombination 417

hotspots that were identified specifically in isolates from Cambodia, the site of origin of 418

NF135.C10 (65). 419

Several structural differences in genic regions were also identified between the 420

NF54 assembly and the 3D7 genome (Supplemental Dataset 3); if real, these structural 421

variants would have important implications in the interpretation of trials using 3D7 as a 422

homologous CHMI strain. For example, a 1,887 bp tandem expansion was identified in 423

the NF54 assembly on chromosome 10, which overlapped the region containing liver 424

stage antigen 1 (PfLSA-1, PF3D7_1036400). The structure of this gene in the NF54 strain 425

was reported when PfLSA-1 was first characterized, with unique N- and C-terminal 426

regions flanking a repetitive region consisting of several dozen repeats of a 17 amino acid 427

motif (66,67). The CDS of PfLSA-1 in the NF54 assembly was 5,406 bp in length but only 428

3,489 bp-long in the 3D7 reference. To determine if this was an assembly error in the 429

NF54 assembly, the PfLSA-1 locus from a recently published PacBio-based assembly of 430

3D7 (32) was compared to that of NF54. The two sequences were identical, likely 431

indicative of incorrect collapsing of the repeat region of PfLSA-1 in the 3D7 reference; 432

NF54 and 3D7 PacBio-based assemblies had 79 units of the 17-mer amino acid repeat, 433

compared to only 43 in the 3D7 reference sequence, a result further validated by the 434

inconsistent depth of mapped Illumina reads from NF54 between the PfLSA repeat region 435

and its flanking unique regions in the 3D7 reference (Figure S5). Using the same 436

comparative rationale as above, where structural differences relative to the 3D7 437

reference, but common to both PabBio-based 3D7 and NF54 assemblies, are considered 438

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 19: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

19

true, all structural variations identified in NF54 likely represent existing structural errors in 439

the 3D7 reference genome, including 15 variants that affect genic regions (Figure S6). 440

Small sequence variants between PfSPZ strains and the reference 3D7 genome 441

Very few small sequence variants were identified in NF54 compared to the 3D7 442

reference; 17 non-synonymous mutations were present in 15 single-copy non-443

pseudogene-encoding loci (Supplemental Dataset 2). Short indels were detected in 185 444

genes; many of these indels have length that is not multiple of three and occur in 445

homopolymer runs, possibly representing remaining PacBio sequencing error. However, 446

some may be real, as a small indel causing a frameshift in PF3D7_1417400, a putative 447

protein-coding pseudogene that has previously been shown to accumulate premature 448

stop codons in laboratory-adapted strains (68), and some may be of biologic importance, 449

such as those seen in two histone-related proteins (PF3D7_0823300 and 450

PF3D7_1020700). It has been reported that some clones of 3D7, unlike NF54, are unable 451

to consistently produce gametocytes in long-term culture (27,69); no polymorphisms were 452

observed within or directly upstream of PfAP2-G (PF3D7_1222600) (Table S6), which 453

has been identified as a transcriptional regulator of sexual commitment in P. falciparum 454

(70). However, a non-synonymous mutation from arginine to proline (R1286P) was 455

observed in a AP2-coincident C-terminal domain of PfAP2-L (PF3D7_0730300), a gene 456

associated with liver stage development (71); a proline was also present at the 457

homologous position in 7G8, NF166.C8, and NF135.C10. Two additional putative AP2 458

transcription factor genes (PF3D7_0613800 and PF3D7_1317200) contained a three bp 459

deletion in NF54 relative to 3D7. Although knock-downs of PF3D7_1317200 have 460

resulted in a loss of gametocyte production (72), neither indel was located within predicted 461

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 20: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

20

AP2 domains, and so their biological relevance remains to be determined. 7G8, NF66.C8, 462

and NF135.C10 also have numerous non-synonymous mutations and indels within 463

putative AP2 genes, a small number of which fell in predicted AP2 domains. Finally, 464

NF135.C10 has an insertion almost 200 base-pairs in length relative to 3D7 in the 3’ end 465

of PfAP2-G; the insertion also carries a premature stop codon, leading to a considerably 466

different C-terminal end for the transcription factor (Figure S7). This allele is also present 467

in previously published assemblies for clones from Southeast Asia (33), including the 468

culture-adapted strain DD2, and variations of this indel (without the in-frame stop codon) 469

are also found in several non-human malaria Plasmodium species) (Figure S7), 470

suggesting an interesting evolutionary trajectory of this sequence. 471

Given that no absolute correlates of protection are known for whole organism P. 472

falciparum vaccines, genetic differences were assessed both across the genome and in 473

pre-erythrocytic genes of interest in the three heterologous CHMI strains. As expected, 474

the number of mutations between 3D7 and these three PfSPZ strains was much higher 475

than observed for NF54, with ~40K-55K SNPs and as many indels in each pairwise 476

comparison, roughly randomly distributed among intergenic regions, silent and non-477

synonymous sites (Table 1, Figure 2), and corresponding to a pairwise SNP density 478

relative to 3D7 of 1.9, 2.1 and 2.2 SNPs/Kbp for 7G8, NF166.C8 and NF135.C10, 479

respectively. NF135.C10 had the highest number of unique SNPs genome-wide (SNPs 480

not shared with other PfSPZ strains), with 5% more unique SNPs than NF166.C8 and 481

33% more than 7G8 (Figure S8). Similar trends were seen when restricting the analyses 482

to non-synonymous SNPs (5% and 26% more than NF166.C8 and 7G8, respectively). 483

The lower number of unique SNPs in 7G8 may be due in part to the smaller genome size 484

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 21: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

21

of this strain. Small indels with length multiple of three (but not two) were also common in 485

coding regions across the genome, as expected from purifying selection on mutations 486

that disrupt the reading frame, whereas indels in non-coding regions were primarily of 487

lengths multiple of two, from unequal crossover or replication slippage in the regions of 488

dinucleotide repeats common in the P. falciparum genome (Figure S9), confirming what 489

has been shown in previous work using short-read whole genome sequencing data (73). 490

SNPs were also common in a panel of 42 pre-erythrocytic genes known or 491

suspected to be implicated in immunity to liver-stage parasites (see Methods; Table S7). 492

While the sequence of all these loci was identical between NF54 and 3D7, there was a 493

wide range in the number of sequence variants per locus between 3D7 and the other 494

three PfSPZ strains, with some genes being more conserved than others. For example, 495

PfCSP showed 8, 7 and 6 non-synonymous mutations in 7G8, NF166.C8, and 496

NF135.C10, respectively, relative to 3D7. However, PfLSA-1 had over 100 non-497

synonymous mutations in all three heterologous strains relative to 3D7 (many in the 498

repetitive region of this gene), in addition to significant length differences in the internal 499

repeat region (Figure S10). 500

Immunological relevance of genetic variation among PfSPZ strains 501

The sequence variants mentioned above may impact the ability of the immune 502

system primed with NF54 to recognize the other PfSPZ strains, impairing vaccine efficacy 503

against heterologous CHMI. Data from murine and non-human primate models 504

(5,28,29,74) demonstrate that CD8+ T cells are required for protective efficacy; therefore, 505

the identification of shared and unique CD8+ T cell epitopes across the genome in all four 506

PfSPZ strains may help interpret the differential efficacy seen in heterologous relative to 507

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 22: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

22

homologous CHMI. To create a comparable set of genes in which we were confident of 508

the amino acid sequence, we subset the 5,548 protein-coding genes in the 3D7 genome 509

annotation to 4,814 genes that (1) did not have premature stop codons, 2) were annotated 510

with an expected number of mRNA fragments, and (3) met the above requirements in all 511

four PfSPZ strains (in practice, this removed most, if not all, members of multi-gene 512

families, such as vars, rifins, stevors) (Figure 3A). Strong-binding MHC class I epitopes 513

in the protein sequences from these loci were identified using in silico epitope predictions 514

based on HLA types common in sub-Saharan Africa populations (Table S1). 515

Similar total number of epitopes (sum of unique epitopes, regardless of the HLA-516

type, across genes) were identified in the three heterologous CHMI strains, with 517

NF166.C8 having the fewest (492,291) and NF135.C10 having the most (495,285) 518

(Figure 3B). More epitopes were identified in NF54 (n=512,172) than in the three 519

heterologous CHMI strains. This slightly higher number might be due to a more accurate 520

mapping of the 3D7 genome annotation to the NF54 assembly compared to the other 521

three strains (Figure 3A). More than 80% of epitopes detected in each strain were 522

identical (by sequence and by locus) across all four strains, reflecting epitopes predicted 523

in conserved regions of the 4,814 genes used in this analysis. NF54 had the highest 524

proportion of unique epitopes (9,002, or 1.76%) (Figure 3A), i.e., epitopes not shared 525

with the other PfSPZ strains, followed by NF135.C10 (4,123 or 0.83%), 7G8 (3,500 or 526

0.71%), and NF166.C8 (2,878, or 0.58%). After direct comparisons of NF54 to each 527

heterologous PfSPZ strain, NF166 had the lowest proportion of epitopes not shared with 528

NF54 (1.97%, compared to 2.24% and 2.27% for 7G8 and NF135.C10, respectively), as 529

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 23: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

23

would follow from a positive correlation between geographic distance and genetic 530

similarity. 531

To focus the analysis on genes that might be important for the protective efficacy 532

of a whole sporozoite vaccine, the above epitope predictions were stratified by two 533

additional gene sets: 1) 1,974 genes that were found to be expressed in the sporozoite 534

stage of the parasite in mosquito salivary glands (54,55), and (2) the 42 pre-erythrocytic 535

genes of interest mentioned above. In the sporozoite-expressed gene set, similar patterns 536

as in the genome-wide gene set above were seen, with NF135.C10 being the 537

heterologous CHMI strain that had the highest number of epitopes not shared with any 538

other strain (1,243, or 0.6%), and NF166 having the smallest (884, or 0.4%) (Figure 3C). 539

Within the 42 pre-erythrocytic gene set, there were 117, 121, and 153 such unique 540

epitopes present in 7G8, NF166.C8, and NF135.C10, respectively (Figure S11, Table 541

S8). Genes where unique epitopes were identified in this gene set were also, on average, 542

more polymorphic (Table S7). 543

Some of these variations in epitope sequences are relevant for the interpretation 544

of the outcome of PfSPZ Vaccine trials. For example, while all four strains are identical in 545

sequence composition in a B cell epitope potentially relevant for protection recently 546

identified in the circumsporozoite protein (PfCSP, PF3D7_0304600) (75), another B cell 547

epitope that partially overlaps it (76) contained an A98G amino acid difference in 7G8 and 548

NF135.C10 relative to NF54 and NF166.C8. There was also variability in CD8+ T cell 549

epitopes recognized in the Th2R region of the protein, some known for decades (77). 550

Specifically, the PfCSP encoded by the 3D7/NF54 allele was predicted to bind to both 551

HLA-A and HLA-C allele types, but the homologous protein segments in NF166.C8 and 552

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 24: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

24

NF135.C10 were only recognized by HLA-A allele types, and no epitope was detected 553

(within the HLA types studied) at that position in PfCSP encoded in 7G8 (Figure 4). 554

Expansion of the analyses to additional HLA types revealed an allele (HLA-08:01) that is 555

predicted to bind to the Th2R region of the 7G8-encoded PfCSP; however, HLA-08:01 is 556

much more frequent in European populations (10-15%) than in African populations (1-557

6%) (51). Therefore, if CD8+ T cell epitopes in the Th2R region of 7G8 are important for 558

protection, which is currently unknown, the level of protection against CHMI with 7G8 559

observed in volunteers of European descent may not be correlated with PfSPZ Vaccine 560

efficacy in Africa. Indels and repeat regions also sometimes affected the number of 561

predicted epitopes in each antigen; for example, an insertion in 7G8 near amino acid 562

residue 1,600 in PfLISP-2 (PF3D7_0405300) contained additional predicted epitopes 563

(Figure S12). Similar patterns in variation in epitope recognition and frequency were 564

found in other pre-erythrocytic genes of interest, including PfLSA-3 (PF3D7_0220000), 565

PfAMA-1 (PF3D7_1133400) and PfTRAP (PF3D7_1335900) (Figure S12). 566

PfSPZ strains and global parasite diversity 567

The four PfSPZ strains have been adapted and kept in culture for extended periods 568

of time. To determine if they are still representative of the malaria-endemic regions from 569

which they were collected, we compared these strains to over 600 recent (2007-2014) 570

clinical isolates from South America, Africa, Southeast Asia, and Oceania (Supplemental 571

Dataset 4), using principal coordinates analysis (PCoA) based on SNP calls generated 572

from Illumina whole genome sequencing data. The results confirmed the existence of 573

global geographic differences in genetic variation previously reported (78,79), including 574

clustering by continent, as well as a separation of East from West Africa and of the 575

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 25: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

25

Amazonian region from that West of the Andes (Figure 5). The PfSPZ strains clustered 576

with others from their respective geographic regions, both at the genome-wide level and 577

when restricting the data set to SNPs in the panel of 42 pre-erythrocytic antigens, despite 578

the long-term culturing of some of these strains (Figure 5). An admixture analysis of 579

South American and African clinical isolates confirmed that NF54 and NF166.C8 both 580

have the genomic background characteristic of West Africa, while 7G8 is clearly a South 581

American strain (Figure S13). 582

NF135.C10 was isolated in the early 1990s (14), at a time when resistance to 583

chloroquine and sulfadoxine-pyrimethamine resistance was entrenched and resistance 584

to mefloquine was emerging (80,81), and carries signals from this period of drug pressure. 585

Four copies of MDR-1 were identified in NF135.C10 (Table S9); however, two of these 586

copies appeared to have premature stop codons introduced by SNPs and/or indels, 587

leaving potentially only two functional copies in the genome. While NF135.C10 also had 588

numerous point mutations relative to 3D7 in genes such as PfCRT (conveying 589

chloroquine resistance), and PfDHPS and PfDHR (conveying sulfadoxine-pyrimethamine 590

resistance), NF135.C10 was isolated before the widespread deployment of artemisinin-591

based combination therapies (ACTs), and had the wild-type allele in the locus that 592

encodes the kelch protein in chromosome 13 (PfK13), with no mutations known to convey 593

artemisinin resistance detected in the propeller region (Table S10). 594

The emergence in Southeast Asia of resistance to antimalarial drugs, including 595

artemisinins and drugs used in artemisinin-based combination treatments, is thought to 596

underlie the complex and dynamic parasite population structure in the region (82). 597

Several relatively homogeneous subpopulations, whose origin is likely linked to the 598

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 26: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

26

emergence and rapid spread of drug resistance mutations, exist in parallel with a sensitive 599

subpopulation that reflects the ancestral population in the region (referred to as KH1), 600

and another subpopulation of admixed genomic background (referred to as KHA), 601

possibly the source of the drug-resistant subpopulations or the result of a secondary mix 602

of resistant subpopulations (61,62,83,84). This has been accompanied by reports of 603

individual K13 mutations conferring artemisinin resistance occurring independently on 604

multiple genomic backgrounds (85). To determine the subpopulation to which NF135.C10 605

belongs, an admixture analysis was conducted using isolates from Southeast Asia and 606

Oceania, including NF135.C10. Eleven total populations were detected, of which seven 607

contained Cambodian isolates (Figure 6). Both admixture and hierarchical clustering 608

analyses suggest that NF135.C10 is representative of the previously described admixed 609

KHA subpopulation (61,62) (Figure 6), implying that NF135.C10 is representative of a 610

long-standing admixed population of parasites in Cambodia rather than one of several 611

subpopulations thought to have arisen more recently following the emergence of drug 612

resistance. 613

Discussion 614

Whole organism sporozoite vaccines have provided variable levels of protection in 615

initial clinical trials; the radiation-attenuated PfSPZ Vaccine has been shown to protect 616

>90% of subjects against homologous CHMI at 3 weeks after the last dose in 5 clinical 617

trials in the U.S. (6,8) and Germany (11). However, efficacy has been lower against 618

heterologous CHMI (8,9), and in field studies in a region of intense transmission, in Mali, 619

at 24 weeks (10). Interestingly, for the exact same immunization regimen, protective 620

efficacy by proportional analysis was greater in the field trial in Mali (29%) than it was 621

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 27: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

27

against heterologous CHMI with Pf 7G8 in the US at 24 weeks after last dose of vaccine 622

(8%) (8,10). While evidence shows that whole organism-based vaccine efficacy can be 623

improved by adjusting the vaccine dose and schedule (11), further optimization of such 624

vaccines will be facilitated by a thorough understanding of the genotypic and immunologic 625

differences among the PfSPZ strains and between them and parasites in malaria endemic 626

regions. 627

A recent study examined whole genome short-read sequencing data to 628

characterize NF166.C8 and NF135.C10 through SNP calls, and identified a number of 629

non-synonymous mutations at a few loci potentially important for the efficacy of CPS, the 630

foundation for PfSPZ-CVac (17). The analyses described here, using high-quality de novo 631

genome assemblies, expand the analysis to hard-to-call regions, such as those 632

containing gene families, repeats and other low complexity sequences. The added 633

sensitivity enabled the thorough genomic characterization of these and additional 634

vaccine-related strains, and revealed a considerably higher number of sequence variants 635

than can be called using short read data alone, as well as indels and structural variants 636

between assemblies. For example, the insertion close to the 3’ end of PfAP2-G detected 637

in NF135.C10 and shared by DD2 has not, to the best of our knowledge, been reported 638

before, despite the multiple studies highlighting the importance of this gene in sexual 639

commitment in P. falciparum strains, including DD2 (70). Long-read sequencing also 640

confirmed that differences observed between NF54 and 3D7 in a major liver stage 641

antigen, PfLSA-1, result from one of a small number of errors lingering in the reference 642

3D7 genome, which is being continually updated and improved (34). Confirmation that 643

NF54 and 3D7 are identical at this locus is critical when 3D7 has been used as a 644

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 28: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

28

homologous CHMI in whole sporozoite, NF54-based vaccine studies. Furthermore, the 645

comprehensive sequence characterization of variant surface antigen-encoding loci, such 646

as PfEMP1-encoding genes, will enable the use of the PfSPZ strains to study the role of 647

these protein families in virulence, naturally acquired immunity and vaccine-induced 648

protection (86). 649

The comprehensive genetic and genomic studies reported herein were designed 650

to provide insight into the outcome of homologous and heterologous CHMI studies, and 651

to determine whether the CHMI strains can be used as a proxy for strains present in the 652

field. Comparison of genome assemblies confirmed that NF54 and 3D7 have remained 653

genetically very similar over time, and that 3D7 is an appropriate homologous CHMI 654

strain. As expected, 7G8, NF166.C8, and NF135.C10 were genetically very distinct from 655

NF54 and 3D7, with thousands of differences across the genome including dozens in 656

known pre-erythrocytic antigens. The identification of sequence variants (both SNPs and 657

indels) within transcriptional regulators, such as the AP2 family, may assist in the study 658

of different growth phenotypes in these strains. NF166.C8 and NF135.C10 merozoites 659

enter the blood stream several days earlier than those of NF54 (15), suggesting that NF54 660

may develop more slowly in hepatocytes than do the other two strains. Therefore, 661

mutations in genes associated with liver-stage development (as was observed with 662

PfAP2-L) may be of interest to explore further. Finally, comparison of the PfSPZ strains 663

to whole genome sequencing data from clinical isolates shows that, at the whole genome 664

level, they are indeed representative of their geographical regions of origin. 665

These results can assist in the interpretation of CHMI studies in multiple ways. 666

First, in the 42 pre-erythrocytic genes of interest, the Brazilian strain 7G8, one of the first 667

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 29: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

29

strains to be used in heterologous CHMI studies, and the Guinean strain NF166.C8 share 668

a similar number of epitopes with NF54. Furthermore, both NF166.C8 and NF135.C10 669

encode more unique non-synonymous SNPs genome-wide (19% and 33%, respectively) 670

and epitopes in these genes relative to NF54 than does 7G8. These results show that the 671

practice of equating geographic distance to genetic differentiation is not always valid, and 672

that the interpretation of CHMI studies should rest upon thorough genome-wide 673

comparisons. Lastly, of all PfSPZ strains, NF135.C10 is the most genetically distinct from 674

NF54, including in point mutations, unique epitopes, and its placement within the global 675

parasite population. We do not know whether the peptide sequences of the actual epitope 676

targets of vaccine-induced protective immunity are equally distinct in NF135.C10. 677

However, if proteome-wide genetic divergence is the primary determinant of differences 678

in protection against different parasites, the extent to which NF54-based immunization 679

protects against CHMI with NF135.C10 is important in understanding the ability of PfSPZ 680

Vaccine and other whole-organism malaria vaccines to protect against diverse parasites 681

present world-wide. These conclusions are drawn from genome-wide analyses and from 682

subsets of genes for which a role in whole-sporozoite-induced protection is suspected but 683

not experimentally established. Conclusive statements regarding cross-protection will 684

require the additional knowledge of the genetic basis of whole-organism vaccine 685

protection. 686

Without more information on the epitope targets of protective immunity induced by 687

PfSPZ vaccines, it is difficult to rationally design multi-strain PfSPZ vaccines. However, 688

these data and analysis approach has the potential to be applied to rationally design multi-689

strain sporozoite-based vaccines once knowledge of those critical epitope sequences is 690

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 30: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

30

available, and characterization of a variety of P. falciparum strains may facilitate the 691

development of region-specific or multi-strain vaccines with greater protective efficacy. 692

Support for a genomics-guided approach to guide such next-generation vaccines can be 693

found in other whole organism parasitic vaccines. Field trials testing the efficacy of first-694

generation whole killed-parasite vaccines against Leishmania had highly variable results 695

(87). While most studies failed to show protection, indicating that killed, whole-cell 696

vaccines for leishmaniasis may not produce the necessary protective response, a trial 697

demonstrating significant protection utilized a multi-strain vaccine, with strains collected 698

from the immediate area of the trial (88), highlighting the importance of understanding the 699

distribution of genetic diversity in pathogen populations. In addition, a highly efficacious 700

non-attenuated, three-strain, whole organism vaccine exists against Theileria parva, a 701

protozoan parasite that causes East coast fever in cattle. This vaccine, named Muguga 702

Cocktail, consists of a mix of three live strains of T. parva that are administered in an 703

infection-and-treatment method, similar to the approach utilized by PfSPZ-CVac. It has 704

been shown recently that two of the strains are genetically very similar, possibly clones 705

of the same isolates (89). Despite this, the vaccine remains highly efficacious and in high 706

demand (90). In addition, the third vaccine strain in the Muguga Cocktail is quite distinct 707

from the other two, with ~5 SNPs/kb (89), or about twice the SNP density seen between 708

NF54 and other PfSPZ strains. These observations suggest that an efficacious multi-709

strain vaccine against a highly variable parasite species does not need to be contain a 710

large number of strains, but that the inclusion of highly divergent strains may be 711

warranted. These results also speak to the promise of multi-strain vaccines against highly 712

diverse pathogens, including apicomplexans with large genomes and complex life cycles. 713

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 31: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

31

Next-generation whole genome sequencing technology has opened many 714

avenues for infectious disease research and holds great promise for informing vaccine 715

design. While most malaria vaccine development has occurred before the implementation 716

of regular use of whole genome sequencing, the tools now available allow the precise 717

characterization and informed selection of vaccine strains early in the development 718

process. These results will greatly assist in the interpretation of clinical trials using these 719

strains for CHMI. 720

References and Notes: 721

722

1. World Malaria Report 2018. Geneva: World Health Organization; 2018 p. 1–210. 723

2. Bhatt S, Weiss DJ, Cameron E, Bisanzio D, Mappin B, Dalrymple U, et al. The 724

effect of malaria control on Plasmodium falciparum in Africa between 2000 and 725

2015. Nature. 2015 Oct 8;526(7572):207–11. 726

3. Delemarre BJ, van der Kaay HJ. Tropical malaria contracted the natural way in the 727

Netherlands. Ned Tijdschr Geneeskd. 1979 Nov 17;123(46):1981–2. 728

4. Hoffman SL, Billingsley PF, James E, Richman A, Loyevsky M, Li T, et al. 729

Development of a metabolically active, non-replicating sporozoite vaccine to 730

prevent Plasmodium falciparum malaria. Hum Vaccin. 2010 Jan;6(1):97–106. 731

5. Epstein JE, Tewari K, Lyke KE, Sim BKL, Billingsley PF, Laurens MB, et al. Live 732

Attenuated Malaria Vaccine Designed to Protect Through Hepatic CD8+ T Cell 733

Immunity. Science. 2011 Oct 28;334(6055):475–80. 734

6. Seder RA, Chang L-J, Enama ME, Zephir KL, Sarwar UN, Gordon IJ, et al. 735

Protection Against Malaria by Intravenous Immunization with a Nonreplicating 736

Sporozoite Vaccine. Science. 2013 Sep 20;341(6152):1359–65. 737

7. Ishizuka AS, Lyke KE, DeZure A, Berry AA, Richie TL, Mendoza FH, et al. 738

Protection against malaria at 1 year and immune correlates following PfSPZ 739

vaccination. Nat Med. 2016 Jun;22(6):614–23. 740

8. Epstein JE, Paolino KM, Richie TL, Sedegah M, Singer A, Ruben AJ, et al. 741

Protection against Plasmodium falciparum malaria by PfSPZ Vaccine. JCI Insight 742

[Internet]. 2017 Jan 12 [cited 2017 Jan 14];2(1). Available from: 743

http://insight.jci.org/articles/view/89154 744

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 32: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

32

9. Lyke KE, Ishizuka AS, Berry AA, Chakravarty S, DeZure A, Enama ME, et al. 745

Attenuated PfSPZ Vaccine induces strain-transcending T cells and durable 746

protection against heterologous controlled human malaria infection. Proc Natl Acad 747

Sci. 2017 Mar 7;114(10):2711–6. 748

10. Sissoko MS, Healy SA, Katile A, Omaswa F, Zaidi I, Gabriel EE, et al. Safety and 749

efficacy of PfSPZ Vaccine against Plasmodium falciparum via direct venous 750

inoculation in healthy malaria-exposed adults in Mali: a randomised, double-blind 751

phase 1 trial. Lancet Infect Dis. 2017 May 1;17(5):498–509. 752

11. Mordmüller B, Surat G, Lagler H, Chakravarty S, Ishizuka AS, Lalremruata A, et al. 753

Sterile protection against human malaria by chemoattenuated PfSPZ vaccine. 754

Nature. 2017 Feb 23;542(7642):445–9. 755

12. Vaughan AM, Sack BK, Dankwa D, Minkah N, Nguyen T, Cardamone H, et al. A 756

Plasmodium Parasite with Complete Late Liver Stage Arrest Protects against 757

Preerythrocytic and Erythrocytic Stage Infection in Mice. Infect Immun. 2018;86(5). 758

13. Mendes AM, Machado M, Gonçalves-Rosa N, Reuling IJ, Foquet L, Marques C, et 759

al. A Plasmodium berghei sporozoite-based vaccination platform against human 760

malaria. NPJ Vaccines [Internet]. 2018 [cited 2019 Jun 25];3. Available from: 761

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6109154/ 762

14. Teirlinck AC, Roestenberg M, Vegte-Bolmer M van de, Scholzen A, Heinrichs MJL, 763

Siebelink-Stoter R, et al. NF135.C10: A New Plasmodium falciparum Clone for 764

Controlled Human Malaria Infections. J Infect Dis. 2013 Feb 15;207(4):656–60. 765

15. McCall MBB, Wammes LJ, Langenberg MCC, Gemert G-J van, Walk J, Hermsen 766

CC, et al. Infectivity of Plasmodium falciparum sporozoites determines emerging 767

parasitemia in infected volunteers. Sci Transl Med. 2017 Jun 21;9(395):eaag2490. 768

16. Schats R, Bijker EM, van Gemert G-J, Graumans W, van de Vegte-Bolmer M, van 769

Lieshout L, et al. Heterologous Protection against Malaria after Immunization with 770

Plasmodium falciparum Sporozoites. PLoS ONE. 2015 May 1;10(5):e0124243. 771

17. Walk J, Reuling IJ, Behet MC, Meerstein-Kessel L, Graumans W, van Gemert G-J, 772

et al. Modest heterologous protection after Plasmodium falciparum sporozoite 773

immunization: a double-blind randomized controlled clinical trial. BMC Med. 2017 774

Sep 13;15:168. 775

18. Volkman SK, Hartl DL, Wirth DF, Nielsen KM, Choi M, Batalov S, et al. Excess 776

Polymorphisms in Genes for Membrane Proteins in Plasmodium falciparum. 777

Science. 2002 Oct 4;298(5591):216–8. 778

19. Thera MA, Doumbo OK, Coulibaly D, Laurens MB, Ouattara A, Kone AK, et al. A 779

Field Trial to Assess a Blood-Stage Malaria Vaccine. N Engl J Med. 2011 Sep 780

15;365(11):1004–13. 781

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 33: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

33

20. Ouattara A, Takala-Harrison S, Thera MA, Coulibaly D, Niangaly A, Saye R, et al. 782

Molecular Basis of Allele-Specific Efficacy of a Blood-Stage Malaria Vaccine: 783

Vaccine Development Implications. J Infect Dis. 2013 Feb;207(3):511–9. 784

21. Neafsey DE, Juraska M, Bedford T, Benkeser D, Valim C, Griggs A, et al. Genetic 785

Diversity and Protective Efficacy of the RTS,S/AS01 Malaria Vaccine. N Engl J 786

Med. 2015 Oct 21;0(0):null. 787

22. Takala SL, Plowe CV. Genetic diversity and malaria vaccine design, testing, and 788

efficacy: Preventing and overcoming “vaccine resistant malaria.” Parasite Immunol. 789

2009 Sep;31(9):560–73. 790

23. Barry AE, Arnott A. Strategies for Designing and Monitoring Malaria Vaccines 791

Targeting Diverse Antigens. Front Immunol [Internet]. 2014 Jul 28;5. Available 792

from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4112938/ 793

24. Preston MD, Campino S, Assefa SA, Echeverry DF, Ocholla H, Amambua-Ngwa A, 794

et al. A barcode of organellar genome polymorphisms identifies the geographic 795

origin of Plasmodium falciparum strains. Nat Commun [Internet]. 2014 Jun 13 [cited 796

2015 Jun 22];5. Available from: 797

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4082634/ 798

25. Drakeley CJ, Duraisingh MT, Póvoa M, Conway DJ, Targett GAT, Baker DA. 799

Geographical distribution of a variant epitope of Pfs4845, a Plasmodium falciparum 800

transmission-blocking vaccine candidate. Mol Biochem Parasitol. 1996 Oct 801

30;81(2):253–7. 802

26. Walliker D, Quakyi IA, Wellems TE, McCutchan TF, Szarfman A, London WT, et al. 803

Genetic Analysis of the Human Malaria Parasite Plasmodium falciparum. Science. 804

1987 Jun 26;236(4809):1661–6. 805

27. Delves MJ, Straschil U, Ruecker A, Miguel-Blanco C, Marques S, Dufour AC, et al. 806

Routine in vitro culture of P. falciparum gametocytes to evaluate novel 807

transmission-blocking interventions. Nat Protoc. 2016 Sep;11(9):1668–80. 808

28. Schofield L, Villaquiran J, Ferreira A, Schellekens H, Nussenzweig R, 809

Nussenzweig V. γ Interferon, CD8+ T cells and antibodies required for immunity to 810

malaria sporozoites. Nature. 1987 Dec 23;330(6149):664–6. 811

29. Weiss WR, Sedegah M, Beaudoin RL, Miller LH, Good MF. CD8+ T cells 812

(cytotoxic/suppressors) are required for protection in mice immunized with malaria 813

sporozoites. Proc Natl Acad Sci. 1988 Jan 1;85(2):573–6. 814

30. Chattopadhyay R, Pratt D. Role of controlled human malaria infection (CHMI) in 815

malaria vaccine development: A U.S. food & drug administration (FDA) 816

perspective. Vaccine. 2017 May 15;35(21):2767–9. 817

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 34: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

34

31. Gardner MJ, Hall N, Fung E, White O, Berriman M, Hyman RW, et al. Genome 818

sequence of the human malaria parasite Plasmodium falciparum. Nature. 2002 Oct 819

3;419(6906):498–511. 820

32. Vembar SS, Seetin M, Lambert C, Nattestad M, Schatz MC, Baybayan P, et al. 821

Complete telomere-to-telomere de novo assembly of the Plasmodium falciparum 822

genome through long-read (>11 kb), single molecule, real-time sequencing. DNA 823

Res. 2016 Jun 26;dsw022. 824

33. Otto TD, Böhme U, Sanders M, Reid A, Bruske EI, Duffy CW, et al. Long read 825

assemblies of geographically dispersed Plasmodium falciparum isolates reveal 826

highly structured subtelomeres. Wellcome Open Res [Internet]. 2018 May 3 [cited 827

2018 Jul 16];3. Available from: 828

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5964635/ 829

34. Böhme U, Otto TD, Sanders M, Newbold CI, Berriman M. Progression of the 830

canonical reference malaria parasite genome from 2002–2019. Wellcome Open 831

Res [Internet]. 2019 Mar 29 [cited 2019 Jun 1];4. Available from: 832

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6484455/ 833

35. Yeda R, Ingasia LA, Cheruiyot AC, Okudo C, Chebon LJ, Cheruiyot J, et al. The 834

Genotypic and Phenotypic Stability of Plasmodium falciparum Field Isolates in 835

Continuous In Vitro Culture. PLoS ONE. 2016 Jan 11;11(1):e0143565. 836

36. Oyola SO, Ariani CV, Hamilton WL, Kekre M, Amenga-Etego LN, Ghansah A, et al. 837

Whole genome sequencing of Plasmodium falciparum from dried blood spots using 838

selective whole genome amplification. Malar J. 2016;15:597. 839

37. Agrawal S, Moser KA, Morton L, Cummings MP, Parihar A, Dwivedi A, et al. 840

Association of a Novel Mutation in the Plasmodium falciparum Chloroquine 841

Resistance Transporter With Decreased Piperaquine Sensitivity. J Infect Dis. 2017 842

Aug 15;216(4):468–76. 843

38. Trager W, Jensen JB. Human malaria parasites in continuous culture. Science. 844

1976 Aug 20;193(4254):673–5. 845

39. Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. Canu: 846

scalable and accurate long-read assembly via adaptive k-mer weighting and repeat 847

separation. Genome Res. 2017 Mar 15;gr.215087.116. 848

40. Chin C-S, Alexander DH, Marks P, Klammer AA, Drake J, Heiner C, et al. 849

Nonhybrid, finished microbial genome assemblies from long-read SMRT 850

sequencing data. Nat Methods. 2013 Jun;10(6):563–9. 851

41. Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, Sakthikumar S, et al. Pilon: An 852

Integrated Tool for Comprehensive Microbial Variant Detection and Genome 853

Assembly Improvement. PLoS ONE. 2014 Nov 19;9(11):e112963. 854

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 35: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

35

42. Delcher AL, Kasif S, Fleischmann RD, Peterson J, White O, Salzberg SL. 855

Alignment of whole genomes. Nucleic Acids Res. 1999 Jan 1;27(11):2369–76. 856

43. Nattestad M, Schatz MC. Assemblytics: a web analytics tool for the detection of 857

variants from an assembly. Bioinformatics. 2016 Oct 1;32(19):3021–3. 858

44. Sedlazeck FJ, Rescheneder P, Smolka M, Fang H, Nattestad M, Haeseler A, et al. 859

Accurate detection of complex structural variations using single-molecule 860

sequencing. Nat Methods. 2018 Apr 30;1. 861

45. Nattestad M, Chin C-S, Schatz MC. Ribbon: Visualizing complex genome 862

alignments and structural variation. bioRxiv. 2016 Oct 20;082123. 863

46. Dara A, Drábek EF, Travassos MA, Moser KA, Delcher AL, Su Q, et al. New var 864

reconstruction algorithm exposes high var sequence diversity in a single 865

geographic location in Mali. Genome Med. 2017;9:30. 866

47. Rask TS, Hansen DA, Theander TG, Pedersen AG, Lavstsen T. Plasmodium 867

falciparum Erythrocyte Membrane Protein 1 Diversity in Seven Genomes – Divide 868

and Conquer. PLOS Comput Biol. 2010 Sep 16;6(9):e1000933. 869

48. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat 870

Methods. 2012 Apr;9(4):357–9. 871

49. Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, et al. 872

SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell 873

Sequencing. J Comput Biol. 2012 May;19(5):455–77. 874

50. Nielsen M, Andreatta M. NetMHCpan-3.0; improved prediction of binding to MHC 875

class I molecules integrating information from multiple receptor and peptide length 876

datasets. Genome Med. 2016 Mar 30;8:33. 877

51. González-Galarza FF, Takeshita LYC, Santos EJM, Kempson F, Maia MHT, Silva 878

ALS da, et al. Allele frequency net 2015 update: new features for HLA epitopes, 879

KIR and disease and HLA adverse drug reaction associations. Nucleic Acids Res. 880

2015 Jan 28;43(D1):D784–8. 881

52. de Pablo R, García-Pacheco J m., Vilches C, Moreno M e., Sanz L, Rementería M 882

c., et al. HLA class I and class II allele distribution in the Bubi population from the 883

island of Bioko (Equatorial Guinea). Tissue Antigens. 1997 Dec 1;50(6):593–601. 884

53. Cao K, Moormann A m., Lyke K e., Masaberg C, Sumba O p., Doumbo O k., et al. 885

Differentiation between African populations is evidenced by the diversity of alleles 886

and haplotypes of HLA class I loci. Tissue Antigens. 2004 Apr 1;63(4):293–325. 887

54. Lasonder E, Janse CJ, Gemert G-J van, Mair GR, Vermunt AMW, Douradinha BG, 888

et al. Proteomic Profiling of Plasmodium Sporozoite Maturation Identifies New 889

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 36: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

36

Proteins Essential for Parasite Development and Infectivity. PLOS Pathog. 2008 890

Oct 31;4(10):e1000195. 891

55. Lindner SE, Swearingen KE, Harupa A, Vaughan AM, Sinnis P, Moritz RL, et al. 892

Total and Putative Surface Proteomics of Malaria Parasite Salivary Gland 893

Sporozoites. Mol Cell Proteomics. 2013 May 1;12(5):1127–43. 894

56. Felgner P, Hoffman SL, Seder R. Serum antibody assay for determining protection 895

from malaria, and pre-erythrocytic subunit vaccines [Internet]. US20160216276A1, 896

2016 [cited 2019 May 20]. Available from: 897

https://patents.google.com/patent/US20160216276A1/en 898

57. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The 899

Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation 900

DNA sequencing data. Genome Res. 2010 Sep 1;20(9):1297–303. 901

58. DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, et al. A 902

framework for variation discovery and genotyping using next-generation DNA 903

sequencing data. Nat Genet. 2011 May;43(5):491–8. 904

59. Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, del Angel G, Levy-Moonshine 905

A, et al. From FastQ data to high confidence variant calls: the Genome Analysis 906

Toolkit best practices pipeline. Curr Protoc Bioinforma Ed Board Andreas 907

Baxevanis Al. 2013 Oct 15;11(1110):11.10.1-11.10.33. 908

60. Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in 909

unrelated individuals. Genome Res. 2009 Sep 1;19(9):1655–64. 910

61. Miotto O, Almagro-Garcia J, Manske M, MacInnis B, Campino S, Rockett KA, et al. 911

Multiple populations of artemisinin-resistant Plasmodium falciparum in Cambodia. 912

Nat Genet. 2013 Jun;45(6):648–55. 913

62. Dwivedi A, Reynes C, Kuehn A, Roche DB, Khim N, Hebrard M, et al. Functional 914

analysis of Plasmodium falciparum subpopulations associated with artemisinin 915

resistance in Cambodia. Malar J. 2017 Dec 19;16:493. 916

63. Carneiro MO, Russ C, Ross MG, Gabriel SB, Nusbaum C, DePristo MA. Pacific 917

biosciences sequencing technology for genotyping and variation discovery in 918

human data. BMC Genomics. 2012 Aug 5;13:375. 919

64. Zanghì G, Vembar SS, Baumgarten S, Ding S, Guizetti J, Bryant JM, et al. A 920

Specific PfEMP1 Is Expressed in P. falciparum Sporozoites and Plays a Role in 921

Hepatocyte Infection. Cell Rep. 2018 Mar 13;22(11):2951–63. 922

65. Mu J, Myers RA, Jiang H, Liu S, Ricklefs S, Waisberg M, et al. Plasmodium 923

falciparum genome-wide scans for positive selection, recombination hot spots and 924

resistance to antimalarial drugs. Nat Genet. 2010 Mar;42(3):268–71. 925

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 37: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

37

66. Guerin-Marchand C, Druilhe P, Galey B, Londono A, Patarapotikul J, Beaudoin RL, 926

et al. A liver-stage-specific antigen of Plasmodium falciparum characterized by 927

gene cloning. Nature. 1987 Sep 10;329(6135):164–7. 928

67. Fidock DA, Gras-Masse H, Lepers JP, Brahimi K, Benmohamed L, Mellouk S, et al. 929

Plasmodium falciparum liver stage antigen-1 is well conserved and contains potent 930

B and T cell determinants. J Immunol. 1994 Jul 1;153(1):190–204. 931

68. Claessens A, Affara M, Assefa SA, Kwiatkowski DP, Conway DJ. Culture 932

adaptation of malaria parasites selects for convergent loss-of-function mutants. Sci 933

Rep. 2017 Jan 24;7:41303. 934

69. Gebru T, Lalremruata A, Kremsner PG, Mordmüller B, Held J. Life-span of in vitro 935

differentiated Plasmodium falciparum gametocytes. Malar J. 2017 Aug 11;16:330. 936

70. Kafsack BFC, Rovira-Graells N, Clark TG, Bancells C, Crowley VM, Campino SG, 937

et al. A transcriptional switch underlies commitment to sexual development in 938

malaria parasites. Nature. 2014 Mar 13;507(7491):248–52. 939

71. Iwanaga S, Kaneko I, Kato T, Yuda M. Identification of an AP2-family Protein That 940

Is Critical for Malaria Liver Stage Development. PLOS ONE. 2012 Nov 941

7;7(11):e47557. 942

72. Ikadai H, Saliba KS, Kanzok SM, McLean KJ, Tanaka TQ, Cao J, et al. Transposon 943

mutagenesis identifies genes essential for Plasmodium falciparum 944

gametocytogenesis. Proc Natl Acad Sci. 2013 Apr 30;110(18):E1676–84. 945

73. Miles A, Iqbal Z, Vauterin P, Pearson R, Campino S, Theron M, et al. Indels, 946

structural variation, and recombination drive genomic diversity in Plasmodium 947

falciparum. Genome Res. 2016 Sep 1;26(9):1288–99. 948

74. Doolan DL, Hoffman SL. The Complexity of Protective Immunity Against Liver-949

Stage Malaria. J Immunol. 2000 Aug 1;165(3):1453–62. 950

75. Kisalu NK, Idris AH, Weidle C, Flores-Garcia Y, Flynn BJ, Sack BK, et al. A human 951

monoclonal antibody prevents malaria infection by targeting a new site of 952

vulnerability on the parasite. Nat Med. 2018 Apr;24(4):408–16. 953

76. Tan J, Sack BK, Oyen D, Zenklusen I, Piccoli L, Barbieri S, et al. A public antibody 954

lineage that potently inhibits malaria infection by dual binding to the 955

circumsporozoite protein. Nat Med. 2018 May;24(4):401–7. 956

77. Doolan DL, Saul AJ, Good MF. Geographically restricted heterogeneity of the 957

Plasmodium falciparum circumsporozoite protein: relevance for vaccine 958

development. Infect Immun. 1992 Feb;60(2):675–82. 959

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 38: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

38

78. Manske M, Miotto O, Campino S, Auburn S, Almagro-Garcia J, Maslen G, et al. 960

Analysis of Plasmodium falciparum diversity in natural infections by deep 961

sequencing. Nature. 2012 Jul 19;487(7407):375–9. 962

79. Yalcindag E, Elguero E, Arnathau C, Durand P, Akiana J, Anderson TJ, et al. 963

Multiple independent introductions of Plasmodium falciparum in South America. 964

Proc Natl Acad Sci. 2012 Jan 10;109(2):511–6. 965

80. Wongsrichanalai C, Pickard AL, Wernsdorfer WH, Meshnick SR. Epidemiology of 966

drug-resistant malaria. Lancet Infect Dis. 2002 Apr 1;2(4):209–18. 967

81. Plowe CV. The evolution of drug-resistant malaria. Trans R Soc Trop Med Hyg. 968

2009 Apr 1;103(Supplement 1):S11–4. 969

82. Parobek CM, Parr JB, Brazeau NF, Lon C, Chaorattanakawee S, Gosi P, et al. 970

Partner-Drug Resistance and Population Substructuring of Artemisinin-Resistant 971

Plasmodium falciparum in Cambodia. Genome Biol Evol. 2017 Jun 1;9(6):1673–86. 972

83. MalariaGEN Plasmodium falciparum Community (last). Genomic epidemiology of 973

artemisinin resistant malaria. eLife. 2016 Mar 4;5:e08714. 974

84. Cerqueira GC, Cheeseman IH, Schaffner SF, Nair S, McDew-White M, Phyo AP, et 975

al. Longitudinal genomic surveillance of Plasmodium falciparum malaria parasites 976

reveals complex genomic architecture of emerging artemisinin resistance. Genome 977

Biol. 2017;18:78. 978

85. Takala-Harrison S, Jacob CG, Arze C, Cummings MP, Silva JC, Dondorp AM, et 979

al. Independent Emergence of Artemisinin Resistance Mutations Among 980

Plasmodium falciparum in Southeast Asia. J Infect Dis. 2015 Mar 1;211(5):670–9. 981

86. Kraemer SM, Smith JD. A family affair: var genes, PfEMP1 binding, and malaria 982

disease. Curr Opin Microbiol. 2006 Aug 1;9(4):374–80. 983

87. Noazin S, Modabber F, Khamesipour A, Smith PG, Moulton LH, Nasseri K, et al. 984

First generation leishmaniasis vaccines: A review of field efficacy trials. Vaccine. 985

2008 Dec 9;26(52):6759–67. 986

88. Armijos RX, Weigel MM, Aviles H, Maldonado R, Racines J. Field Trial of a 987

Vaccine against New World Cutaneous Leishmaniasis in an At-Risk Child 988

Population: Safety, Immunogenicity, and Efficacy during the First 12 Months of 989

Follow-Up. J Infect Dis. 1998 May 1;177(5):1352–7. 990

89. Norling M, Bishop RP, Pelle R, Qi W, Henson S, Drábek EF, et al. The genomes of 991

three stocks comprising the most widely utilized live sporozoite Theileria parva 992

vaccine exhibit very different degrees and patterns of sequence divergence. BMC 993

Genomics. 2015;16:729. 994

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 39: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

39

90. Nene V, Kiara H, Lacasta A, Pelle R, Svitek N, Steinaa L. The biology of Theileria 995

parva and control of East Coast fever - Current status and future trends. Ticks Tick-996

Borne Dis. 2016;7(4):549–64. 997

998

999

Declarations 1000

Ethics approval and consent to participate: Not applicable 1001

Consent for publication: Not applicable 1002

Availability of data and material: Raw sequencing data and assemblies generated for 1003

this manuscript have been uploaded to NCBI. See Dataset 1 for BioSample 1004

identification numbers for assemblies and whole genome sequence data for all samples 1005

used in these analyses, including publicly available data. 1006

Competing interests: T.L., B.K.L.S., and S.L.H. are salaried employees of Sanaria 1007

Inc., the developer and owner of PfSPZ Vaccine, and the supplier of the PfSPZ 1008

strains sequenced in this study. In addition, S.L.H. and B.K.L.S. have a financial 1009

interest in Sanaria Inc. All other authors declare no competing financial interests. 1010

Funding: National Institutes of Health (NIH) awards U19 AI110820 and R01 AI141900, 1011

and a “Dean’s Challenge Award”, an intramural program from the University of Maryland 1012

School of Medicine, provided funds to sequence the PfSPZ strains and Pf clinical 1013

isolates, and support for K.A.M., E.F.D., A.D., M.A., A.O., K.L., L.S., L.T., M.T., S.T.H., 1014

C.M.F., C.V.P. and J.C.S.. A.O. and C.V.P. were also supported by the Howard Hughes 1015

Medical Institute. The parasites provided by Sanaria were produced in part by funding 1016

from two SBIR grants from National Institute of Allergy and Infectious Diseases, NIH, 1017

2R44AI058375 and 5R44AI055229-09A1. Collection of clinical isolates that were 1018

sequenced for this study were funded by U19 AI089683, grants from CNPq 1019

(590106/2011-2), grants from the Armed Forces Health Surveillance Center/Global 1020

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 40: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

40

Emerging Infections Surveillance and Response System (WR1576 and WR2017), the 1021

NIH International Centers of Excellence in Malaria Research program (U19 AI089681) 1022

(Brazil), and the Howard Hughes Medical Institute. S.K. and A.M.P. were supported by 1023

the Intramural Research Program of the National Human Genome Research Institute, 1024

NIH (1ZIAHG200398). P.T.R. was supported by a scholarship from the Conselho 1025

Nacional de Desenvolvimento Cientifico e Tecnologico (CNPq), which also provided a 1026

senior researcher scholarship to M.U.F. 1027

Authors' contributions: K.A.M. designed the study, performed analyses, and wrote the 1028

paper. J.C.S., C.V.P., and S.L.H. designed the study, contributed samples and funding, 1029

and wrote the paper. E.F.D., A.D., J.C., E.M.S., A.D., S.K., A.M.P, and A.O. contributed 1030

analyses. Z.S., M.A., T.L., L.S., L.T. preformed sample preparation and sequencing. 1031

P.T.R., K.E.L., M.D.S., K.J., C.L., D.L.S., M.U.F., M.N., M.K.L., M.T., R.W.S., S.T.H., 1032

C.M.F, and B.K.L.S contributed samples, funding, and wrote the paper. 1033

Acknowledgements: We would like to thank the study teams that assisted in sample 1034

collections in Mali, Malawi, Cambodia, Thailand, and Myanmar. We would like to 1035

acknowledge Drs. Terrie Taylor and Don Mathanga for their assistance with collection of 1036

clinical isolates in Malawi. 1037

Authors' information (optional) 1038

Additional author information for M. D. S, K. J., C. L., D. L. S (affiliated with the Armed 1039

Forces Research Institute of Medical Sciences): Material has been reviewed by the 1040

Walter Reed Army Institute of Research. There is no objection to its presentation 1041

and/or publication. The opinions or assertions contained herein are the private views of 1042

the author, and are not to be construed as official, or as reflecting true views of the 1043

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 41: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

41

Department of the Army or the Department of Defense. The investigators have adhered 1044

to the policies for protection of human subjects as prescribed in AR 70–25. 1045

Tables 1046

Table 1: The PfSPZ strains differ from 3D7 in terms of genome size and base-pair 1047

content. Characteristics of the PacBio assembly for each strain (first four columns), 1048

with the 3D7 reference genome are shown for comparison. Single-nucleotide 1049

polymorphisms (SNPs) and indels in each PfSPZ assembly as compared to 3D7 are 1050

shown for coding and non-coding regions (last five columns). 1051

Strain # of

Nuclear Contigs1

Cumulative Length2

N503

SNPs4 Indels

Non-coding

Syn. Non-Syn.

Non-coding

Coding

3D7 14 23,332,831 1,687,656 - - - - -

NF54 28 23,426,028 1,527,116 1,278 25 80 8,166 230

7G8 20 22,801,099 1,455,033 22,694 5,675 15,490 40,668 6,730

NF166.C8 30 23,306,751 1,610,926 27,117 6,636 17,258 40,457 6,802

NF135.C10 19 23,540,633 1,631,396 28,791 6,535 18,141 41,435 7,029 1Contigs: number of pieces of continuous sequence in the final assembly. 1052 2Cumulative length: total length of the contigs. 1053 3N50: length of the contig which, along with all contigs larger than itself, contain 50% of the assembly 1054 (larger numbers indicate a more complete assembly). 1055 4Syn SNPs: synonymous SNPs; Non-Syn SNPs: non-synonymous SNPs. 1056

1057

1058

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 42: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

42

Figures 1059

1060 Figure 1: PacBio Assemblies for each PfSPZ strain reconstruct entire 1061

chromosomes in one-three continuous pieces. To determine the likely position of 1062

each non-reference contig on the 3D7 reference genome, mummer’s show-tiling program 1063

was used with relaxed settings (-g 100000 -v 50 -i 50) to align contigs to 3D7 1064

chromosomes (top). 3D7 nuclear chromosomes (1-14) are shown in grey, arranged from 1065

smallest to largest, along with organelle genomes (M=mitochondrion, A=apicoplast). 1066

Contigs from each PfSPZ assembly (NF54: black, 7G8: green, NF166.C8: orange, 1067

NF135.C10: hot pink) are shown aligned to their best 3D7 match. A small number of 1068

contigs could not be unambiguously mapped to the 3D7 reference genome (unmapped). 1069

1070

1071

1072

1073

1074

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 43: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

43

1075

Figure 2: Distribution of polymorphisms in PfSPZ PacBio assemblies. Single 1076

nucleotide polymorphism (SNP) densities (log SNPs/ 10kb) are shown for each 1077

assembly; the scale [0-3] refers to the range of the log-scaled SNP density graphs - 1078

from 100 to 103. Inner tracks, from outside to inside, are NF54 (black), 7G8 (green), 1079

NF166.C8 (orange), and NF135.C10 (hot pink). The outermost tracks are the 3D7 1080

reference genome nuclear chromosomes (chrm1 to chrm 14, in blue), followed by 3D7 1081

genes on the forward and reverse strand (black tick marks). 1082

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 44: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

44

Figure 3: Predicted CD8+ T cell epitopes from amino acid sequences of the four PfSPZ strains. Using amino acid sequences from a subset of the 5,548 protein-coding genes in the 3D7 annotation (PlasmoDBv24) that were deemed to be correctly annotated between all four PfSPZ strains, CD8+ T cell epitopes were predicted in silico using netMHCpan. A. Density plot of amino acid sequence length distributions of 4,814 genes used for in silico epitope predictions. Overall distribution of gene lengths was extremely similar between all four strains, with the exception of the three heterologous CHMI strains having a slightly higher number of amino acid sequences that weren’t quite as long as they were in the NF54 genome, possibly a reflection of difficulty in mapping annotations to genetically diverse genomes. B. Shared and unique epitope-gene combinations in the 4,814 genes. C. Shared and unique epitope-gene combinations in a subset of the 4,814 genes (n=1,974) that have evidence of being expressed in the sporozoite stage of the parasite in the mosquito salivary glands.

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 45: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

45

Figure 4: Predicted CD8+ T cell epitopes in the P. falciparum circumsporozoite protein (PfCSP). Protein domain

information based on the 3D7 reference sequence of PfCSP is found in the first track, and the following tracks are

epitopes predicted in the PfCSP sequences of NF54, 7G8, NF166.C8, and NF135.C10, respectively. Each box in the

track is a sequence that was identified as an epitope, and the colors of each box represent the HLA type that identified the

epitope.

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 46: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

46

Figure 5: Global diversity of clinical isolates and PfSPZ strains. Principal coordinates analyses (PCoA) of clinical isolates (n=654) from malaria endemic regions and PfSPZ strains were conducted using biallelic non-synonymous SNPs across the entire genome (left, n=31,761) and in a panel of 42 pre-erythrocytic genes of interest (right, n=1,060). For the genome-wide dataset, coordinate 1 separated South American and African isolates from Southeast Asian and Papua New Guinean isolates (27.6% of variation explained), coordinate 2 separated African isolates from South American isolates (10.7%), and coordinate 3 separated Southeast Asian isolates from Papua New Guinea (PNG) isolates (3.0%). Similar trends were found for the first two coordinates seen for the pre-erythrocytic gene data set (27.1 and 12.6%, respectively), but coordinate 3 separated isolates from all three regions (3.8%). In both datasets, NF54 (black cross) and NF166.C8 (orange cross) cluster with West African isolates (isolates labeled in red and dark orange colors), 7G8 (bright green cross) cluster with isolates from South America (greens and browns), and NF135.C10 (hot pink cross) clusters with isolates from Southeast Asia (purples and blues).

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint

Page 47: Strains used in whole organism Plasmodium falciparum ...1 1 Title: Strains used in whole organism Plasmodium falciparum vaccine trials differ in 2 genome structure, sequence, and immunogenic

47

.

Figure 6: NF135.C10 is part of an admixed population of clinical isolates from

Southeast Asia. Top: admixture plots for clinical isolates from Myanmar (n=16),

Thailand (n=34), Cambodia (n=109), Papua New Guinea (PNG, n=34) and NF135.C10

are shown. Each sample is a column, and the height of the different colors in each

column corresponds to the proportion of the genome assigned to each K population by

the model. Bottom: hierarchical clustering of the Southeast Asian isolates used in the

admixture analysis (n=193; colored by their assigned subpopulation) and previously

characterized Cambodian isolates (n=167; black; labels correspond to clusters identified

in reference 61) place NF135.C10 (star) with samples from the previously identified

KHA admixed population (shown in grey dashed box). Each node ends with a sample,

and the y-axis represents distance between clusters.

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted June 27, 2019. ; https://doi.org/10.1101/684175doi: bioRxiv preprint