81
Correction: 23 August 2013 www.sciencemag.org/cgi/content/full/science.1233158/DC1 Supplementary Materials for Lentiviral Hematopoietic Stem Cell Gene Therapy Benefits Metachromatic Leukodystrophy Alessandra Biffi,* Eugenio Montini, Laura Lorioli, Martina Cesani, Francesca Fumagalli, Tiziana Plati, Cristina Baldoli, Sabata Martino, Andrea Calabria, Sabrina Canale, Fabrizio Benedicenti, Giuliana Vallanti, Luca Biasco, Simone Leo, Nabil Kabbara, Gianluigi Zanetti, William B. Rizzo, Nalini A. L. Mehta, Maria Pia Cicalese, Miriam Casiraghi, Jaap J. Boelens, Ubaldo Del Carro, David J. Dow, Manfred Schmidt, Andrea Assanelli, Victor Neduva, Clelia Di Serio, Elia Stupka, Jason Gardner, Christof von Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding author. E-mail: [email protected] Published 11 July 2013 on Science Express DOI: 10.1126/science.1233158 This PDF file includes Materials and Methods Figs. S1 to S22 Tables S1 to S19 Full References Correction: In Table S8 on page 22, a transcription error of the baseline Gross Motor Function Measure percentage scores of the patients has been corrected.

Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

Correction: 23 August 2013

www.sciencemag.org/cgi/content/full/science.1233158/DC1

Supplementary Materials for

Lentiviral Hematopoietic Stem Cell Gene Therapy Benefits Metachromatic Leukodystrophy

Alessandra Biffi,* Eugenio Montini, Laura Lorioli, Martina Cesani, Francesca Fumagalli, Tiziana Plati, Cristina Baldoli, Sabata Martino, Andrea Calabria, Sabrina Canale,

Fabrizio Benedicenti, Giuliana Vallanti, Luca Biasco, Simone Leo, Nabil Kabbara, Gianluigi Zanetti, William B. Rizzo, Nalini A. L. Mehta, Maria Pia Cicalese, Miriam

Casiraghi, Jaap J. Boelens, Ubaldo Del Carro, David J. Dow, Manfred Schmidt, Andrea Assanelli, Victor Neduva, Clelia Di Serio, Elia Stupka, Jason Gardner, Christof von

Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini

*Corresponding author. E-mail: [email protected]

Published 11 July 2013 on Science Express

DOI: 10.1126/science.1233158

This PDF file includes

Materials and Methods Figs. S1 to S22 Tables S1 to S19 Full References

Correction: In Table S8 on page 22, a transcription error of the baseline Gross Motor Function Measure percentage scores of the patients has been corrected.

Page 2: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

2

SUPPORTING MATERIALS AND METHODS Procedures for integration site mapping The overall scheme of the analysis is described in Fig. S10. Four main macro-activities from “wet” sample processing to bioinformatics were designed. Each macro-activity comprises several dependent activities, connected by data flow.

1. Wet Lab Procedures: LAM-PCRs and Next-Generation Sequencing (NGS). 2. NGS data processing: acquiring as input sequencing reads from NGS

platforms and getting as output the list of integration sites. 3. Data quality processing: performing sequential activities to improve data

quality. Each activity generates the input for distinct processes in macro-activity 4.

4. IS-driven biological analysis: performing inferences on safety and efficacy of gene therapy as well as on human hematopoiesis.

454 pyrosequencing of LAM PCR products Following the method published by Paruzynski et al. (1), we adapted the LAM-PCR samples for 454-pyrosequencing by fusion PCR to add the Roche 454 GS-FLX adaptors: adaptor A, plus a 6 nucleotide barcode, was added to the LTR end of the LAM-PCR amplicon; adaptor B was added to the linker cassette side. In 5’–3’ orientation, the final amplicon is composed as follows: 454 adaptor A, barcode of 6 nucleotides, LTR sequence (63 nucleotides), unknown genomic sequence, linker cassette sequence, and 454 adaptor B. Each sample was amplified with one of 55 available fusion primers carrying each a different barcode sequence. Fusion-primer PCR products were assembled into libraries avoiding repetition of identical barcodes, and sequenced. LAM-PCRs libraries containing unique barcodes were pooled in 50µl and DNA concentration was estimated using the picogreen protocol (Life Technologies). DNA quality was then assessed on the Agilent 2100 bioanalyser and the fragment size range was determined. A modified emulsion (em) -PCR protocol was developed to minimize amplification bias when pools contained a majority of small PCR products e.g. in the 100-250bp range. Firstly, a small volume titration was performed to determine the correct copy number of DNA molecules per bead in order to achieve the recommended 8% enrichment value. For pools with a small size range of PCR products (100-250bp), only ¼ of the recommended amount of primer was used. For pools with larger products, predominantly in the 500bp range, the recommended primer concentration from the standard protocol was used. For emPCR, pools with the majority of products in the 500bp range received the standard amount of emPCR primer (230µl of primer in 3915µl large volume live amplification mix) according to the manufacturer’s recommendation. If the majority of fragments were in the smaller size ranges (100-250bp), a lesser amount of emPCR primer was added (57.5µl of primer in 3915µl large volume live amplification mix). Emulsion breaking, bead purification and sequencing were carried out according to the manufacturer’s standard protocols. To ensure coverage of the six-nucleotide barcode on each read, and of the LTR-genome junction only A beads were used from the Amplicon Lib-A LVE emPCR kit to give unidirectional coverage. Sequencing from the linker cassette end indeed would not always cover the LTR-genome junction and the barcode.

Page 3: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

3

Sample preparation and sequencing on Illumina MiSeq platform MLD LAM-PCR samples generated by the procedure described for 454 amplicon library preparation were quantified with Quant-iT PicoGreen dsDNA Assay Kit (Invitrogen) and checked on a 2100 Bioanalyzer High Sensitivity DNA Chip (Agilent). 250ng of LAM-PCR sample was used for library construction using the TruSeq LT Sample Preparation Kit (Illumina). This involved end repair, adenylation of 3’ ends, ligation with indexing adapters, gel purification, PCR enrichment and validation by qPCR with an Illumina Library Quantification Kit (Kapa BioSystem). Libraries were sequenced using the Illumina MiSeq Reagent Kit (300 cycles) or v2 (500 cycles) for 150 or 250 paired-end reads respectively. In brief, 2nM libraries with different indexes were mixed in equimolar amounts, denatured and diluted to a 10ρM final concentration, a φX bacteriophage genome library was pooled at 50% or 70% with these libraries to introduce diversity and optimize the sequencing run performance. All protocols were according to manufacturer’s instructions with modifications specified above. To provide run details for each experiment a sample sheet was generated on the Illumina MiSeq instrument. The pair-end runs were initiated for either 2x250 or 2x150 bases of Illunima’s sequencing by synthesis technology, including clustering, paired-end preparation, barcode sequencing and analysis. After completion of the run, base calling was performed on data, sequences were demultiplexed and φX reads were filtered. FASTQ format files in Illumina 1.8 format were considered for downstream analysis. Sequence data processing The steps of NGS data processing deal with the management of high-throughput data from Roche 454/Illumina MiSeq sequencing platforms and comprise two main activities:

1. Data quality inspection and analysis, in which lentiviral vector sequences and other contaminants are trimmed;

2. Integration site identification, in which all valid sequence reads are aligned to the genome of reference and valid ISs are retrieved.

Sequence Quality Analysis The aim of the NGS quality analysis is to identify the subset of raw reads that are considered valid for further analyses. Our standard LAM-PCR products contain a LTR sequence, a flanking human genomic sequence and a linker cassette (LC) sequence. The 454 technology allowed retrieval of LAM-PCR sequences with length ranging from 10bp to 900bp; similar results were retrieved from Illumina MiSeq paired-ends reads as reported in Fig. S11 showing raw reads counts and the sequence length without LTR and LC segments of all sequences (Illumina paired-ends have been merged at overlapping regions, otherwise we plotted only reads derived from LTR containing sequences). These length boundaries are important parameters to consider in the quality analysis process since they affect both the subsequent alignment procedure and the algorithm of vector components identification. If we obtain a sequence too short to be correctly aligned to the reference genome, it will be discarded. If the LAM-PCR product exceeds the maximum size reachable with NGS technologies we could miss part or all of the linker cassette sequence. To address these issues we designed a software tool with customized algorithms and logics for quality analysis in two main steps:

Page 4: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

4

1. LTR sequence identification and trimming: we designed a stringent BLASTN approach (http://blast.ncbi.nlm.nih.gov) for the last 63 nucleotides of the LTR region forcing un-gapped alignment (quality score higher than 99) and perfect match on the last 3 bases. Once the correct LTR fragment is recognized, it is trimmed from the raw read which then flows to the next step. A raw read is discarded if does not have any LTR fragment or has a low alignment score quality.

2. LC sequence identification and trimming: reads flowing down from step 1 are checked for the LC presence. A raw read, according to its length, may have or not the LC fragment. We exploited BLASTN algorithm to identify and trim the optional LC presence. The number of output sequences from step 2 is identical to the input from step 1, as step 2 only trims a segment of the input sequence and generates sequences cleaned from both LTR and LC fragments.

Given the initial set of all raw reads R, we run our quality filters obtaining two distinct proper sub-sets of R: T, set of trimmed reads, and E, set of excluded reads. Only reads in T are considered valid and consistent with our requirements of target analysis and are processed for IS identification. We implemented the designed software in Python (http://www.python.org) using BioPython (http://biopython.org) libraries for BLASTN analysis and MySQL database (http://www.mysql.com). Software details (under Linux Ubuntu OS, http://www.ubuntu.com):

• Python release 2.6.5 • NCBI BLASTN (version 2.2.18) • MySQL database release 5.1.41

. ISS Identification In order to identify bona fide ISs it is required to precisely distinguish the first bases of the vector-genome junctions using customized and stringent parameters. Starting from the set of trimmed reads T, we designed to map each t in T to the human reference genome (build HG19/GRCh37, Feb. 2009) with NCBI BLASTN tool. To avoid mapping biases, we discarded all reads with length below 20 bases. For each t we collected all alignment output data (mapping score, identity score, e-value, gaps, q-size, span, starting/ending base, etc.) and all mapping positions. We designed a pipeline with cascading rules to correctly identify each integration site. Each aligned read t must have:

1. An alignment score greater than 95%: this threshold allows selection for reads with a high identity alignment value.

2. The starting alignment position within the third base: we set this threshold since an integration site could span in a range of +/- 3 bases with respect to aligned reads.

The pipeline evaluates rules and produces from the input set three new subsets: • U, set of unambiguous (univocally mapped) sequences that comprises all valid

integration sites. • A, set of ambiguous reads with multiple alignments. • N, set of “no-hit” reads without any positive match to the reference genome.

Elements in U are mapped reads corresponding to so called “redundant” ISs, meaning that each IS may have one or more reads u in U. To generate a list of unique ISs, we grouped mapped reads based on their alignment starting position in a range of +/- 3

Page 5: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

5

bp. We defined the following rule (called no-redundancy rule): for each read u in U, list all integration loci, L, sorted ascending by chromosome and base pair position. We know that |L|<=|U| thus, for each l in L we have at least one corresponding mapped read u in U. If the difference in position between consecutive elements l in L (analyzed chromosome by chromosome) is below the threshold of 3bp, then only the first position is the representative one of the integration locus i; otherwise the different positions are considered representative of distinct integration sites. An additional algorithm specifically designed for reads mapped into repetitive elements allowed us to increase the set of redundant sequences. We implemented a software tool in Python programming language to address each step of the algorithm, exploiting a MySQL database to store data and results. From the original dataset generated by NGS sequencing, 25-31% of reads could be used as sequences to process for the no-redundancy step. These numbers are different from classical genomic studies in which the amount of discarded reads is usually less than 10%. This difference is due to specific technical aspects of the wet-lab procedures and the custom rules required to identify integration sites. In fact, the majority of reads we discarded (21-29%) were vector sequences amplified as internal controls of LAM-PCR. Moreover, we collected a set of reads aligning to repetitive elements that were still potential/candidate integration sites. However, since we could not precisely identify unique genomic locations for them, we decided to discard this set of sequences decreasing the fraction of raw reads available for further analysis. Collision detection In order to obtain a reliable dataset of ISs from each patient, we filtered data from potential contaminations/collisions and from false positives based on sequence counts. An additional step of data normalization was required to combine integration sites resulting from different experiments. The term “collision” is used to identify the presence of identical IS in independent samples. In our experimental setting, the integration of vector in the very same genomic position in different cells is a very low probability event. Thus, the detection of identical ISs in independent samples likely derives from contamination, which may occur at different stages of wet laboratory procedures (sample purification, DNA extraction, LAM-PCRs and NGS). Although our working pipeline is designed to minimize the occurrence of inter-samples contacts, the high-throughput analysis of ISs intrinsically carries a certain degree of background contamination. Identification of the extent of contamination between samples is crucial also because the retrieval of the same IS in different samples obtained from the same patient is used in subsequent steps to make inference on biological properties of the vector-marked hematopoietic cells (i.e. multilineage potential and sustained clonogenic activity). Thus, we must be able to distinguish the actual occurrence of the same IS in different samples (from the same patient) from a contamination/collision. To address these issues, we assessed the extent of shared IS among samples derived from different patients as a way to measure the extent of collision in our analyses and then design rules to discard from each patient’s data set those IS that can be ascribed to collision and minimize the likelihood of scoring false positive when looking for shared IS between samples from the same patient (see also later). We designed a collision detection process allowing the validation of each integration locus. The overall result should be that, given the set I of integration loci, in case of classification of an integration locus i in I as collision, i is discarded from I. We applied collision detection process for data derived from all patients of our two concurrently evaluated and on-going clinical trials of lentiviral

Page 6: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

6

vector gene therapy (3 patients of MLD, this study, and 3 patients of Wiskott-Aldrich Syndrome, WAS, for a total of 71,359 IS; for WAS data see accompanying manuscript by Aiuti et al. in this issue of Science). We analyzed each patient’s collisions with respect to the other patients, therefore all counts and filters are patient-based. A summary of shared integration sites for each patient is represented in Table S13A. Each identical IS has different sequence reads (sequence count) among the patients. Sequence counts can be used to determine whether samples from one patient contaminated the other patients’ samples. Therefore, we could identify a threshold of differential sequence count that allows assigning a given collision to a patient and removing from the others. We retrieved the threshold value from our data by analyzing each patient independently and then combining all patients’ results, as follows: Given C the set of collisions, each c in C has a sequence count s. For each patient, we independently analyzed all collisions: given P the set of patients, p the current patient and p(n) all other patients (P-p), for each patient p, for collision c in Cp we computed the ratio s|cp/s|cp(n), called collision relative frequency (ColRF). We then analyzed the distribution of all ColRF in C to look for flexes and peaks. A representative ColRF plot, where all ColRF values are in log10 transformation is shown in Fig. S13A. We generated theoretical case scenarios for two patients (Table S13B) and applied these scenarios to our empirical ColRF curve thus allowing to interpret the data as decision plot (Fig. S13). The chosen threshold to set for contamination identification patient-based is 1, corresponding to 10 fold difference in linear scale. Appling this filter to the integration sites from patients MLD01, MLD02 and MLD03 resulted in a slight reduction in the overall number of unique insertion sites (as described in Table S13A). In order to understand where potential contaminations occurred among patients and between our trials we further analyzed the sequences that were filtered out by collision detection process (described in Table S13C). Overall, considering that we retrieved a similar total number of IS from each patient, we did not observe any major prevalence of contaminations derived from a single patient with respect to the others, nor among patients of the same versus different trial. These findings are in line with an expected stochastic occurrence of contamination among samples processed for high throughput sequencing of integration sites and provide an estimate of the extent of occurrence of such cross-contamination between samples in our experimental setting averaging 1.64% of the IS retrieved, with an average sequence count of 0.70% from an individual sample. The integration datasets cleaned of collisions (available upon request) were used for all subsequent analyses. Common insertion site analysis Vector integration frequency along the genome is not homogeneous. Dense clusters of integrations contained in a relatively narrow genomic interval, known as Common Insertion Sites (CIS) have been used as an indicator of genetic selection and enrichment of cell clones harboring integrations that, by targeting specific genes, have acquired a selective advantage in vivo (2-9). In hematopoietic cells from patients from γRetroviral Vector (γRV)–based HSC-GT clinical trials for X linked Severe Combined Immunodeficiency (X-SCID), Chronic Granulomatous Disease CGD and Wiskott - Aldrich syndrome (WAS), CIS were identified. Among all CIS identified, some, targeting cancer genes such as LMO2, MECOM, PRDM16, CCND2 and

Page 7: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

7

SETBP1, were found in leukemic/dysplastic/dominant cell clones from patients’ blood. To investigate the presence of CIS in our study we used:

1. A region-based approach based on sliding windows (10). 2. A method for CIS identification based on a new genome-wide Grubbs test for

outliers’ analysis. CIS identification based on the sliding windows approach Using Abel at al. (10) R package (latest update of August 2012), the function “Cluster” returned the original input list of ISs with additional annotation fields such as: “CIS max order” that represents the maximum number of integrations contained in each CIS and “Cluster ID” that represents a genomic window in which one or more CIS intervals are clustered. Given that in this approach for CIS identification the integration frequency is measured on genomic windows rather than genes, the “CIS max order” does not necessarily correspond to the number of integrations targeting a given gene (Fig. S17A). Indeed, in some cases genes are targeted by a number of integrations that is higher than the max CIS order. For example FCHSD2 in Patient MLD02 is targeted by 30 integrations spanning across an interval of 260,285 bp, however the maximum CIS order is 24 because this is the number of ISs falling into the maximum pre-determined interval range of 200 Kbp. CIS intervals of a given max CIS order may contain several genes targeted by a number of integrations lower or equal than the CIS order. For example in patient MLD02 at chromosome 17 a CIS of max order 26 contained 4 genes (NPLOC4, CCDC137, HGS, SLC25A10), in which NPLOC4 was targeted by 12 integrations while the remaining integrations were distributed among the other genes. In each patient at least 500 CIS clusters containing at least one CIS of an order>2, were identified. Given the computational characteristics of this method the total of targeted genes is >1000 (Table S15A). CIS Analysis by the genome-wide Grubbs Test for outliers Not all CIS are necessarily the product of genetic selection as several CIS are also found in cells at early time points after transduction, before genetic selection can occur. We have previously shown that LV CIS in blood cells from HSC gene therapy patients of Adrenoleukodystrophy (ALD) (11) and in human/mouse hematochimeras (2) are clustered in specific megabase-wide genomic regions heavily targeted by LV integrations. On the contrary, the CIS derived from insertional mutagenesis (2) are not clustered in the genome and typically target only one gene within a genomic region, the culprit of the selective advantage/transformation. This observation, suggested that LV CIS might originate as a consequence of intrinsic integration preferences of this vector for specific genomic regions rather than being the result of insertional mutagenesis (2). To distinguish intrinsic integration hotspots from those resulting from genotoxic selection we applied the Grubbs test for outliers to compare the integration frequency at genes contained in selected megabase wide genomic regions. Differently from other approaches for CIS analysis, our approach is gene-centered and corrects the integration frequency by the size of the targeted gene rather than by user-defined genomic intervals (10, 12-14). The rationale of the Grubbs test for outliers for CIS validation relies on the postulate that a significant CIS gene (identified by any statistical method) will be targeted at a significantly higher frequency than the average. Because the genes differ in size resulting in a different

Page 8: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

8

probability to be hit by vector integrations, the number of integrations targeting each gene was divided by the gene size (gene integration frequency). The T-studentization of the distribution of the negative logarithm of the gene integration frequencies provided a statistical significance to the genes targeted at high frequency within a chromosomal interval. If multiple genes contained within the selected interval of analysis appear to be highly targeted, they will not be identified as outliers and thus they are not considered as CIS. Normality test of gene integration frequency distributions The basic calculations for genome-wide Grubbs test are described in Biffi et al., 2011 (2). Briefly, the gene integration frequency is calculated by dividing the number of integrations targeting the same gene on genomic intervals defined by the boundaries of the targeted genes (UCSC HG19 freeze Jul 2012). Given that genes with no integrations can be non-targeted or non-sampled due to possible sub-sampling and saturation issues, LV targeted genes are a conditioning variable and we considered only the genes targeted by at least one integration. The gene integration frequency values are transformed by the minus logarithm base 2 to obtain a statistically evaluable normal distribution of the data. Indeed, the gene integration frequency values do not follow a normal distribution as they may vary from 0 to 1. The –log2 transformed gene integration frequency shows that the data follow a normal distribution by the D'Agostino & Pearson omnibus normality test (Fig. S18A,B and Table S16B,C). For our MLD patients the gene integration frequency was also analyzed separately for each patient. For Patient MLD01, MLD02 and MLD03 a total of 5455, 4084 and 4152 targeted genes were analyzed, respectively. Analysis of the Z-score distributions of the gene integration frequency allows highlighting those genes that have been targeted at a significantly higher frequency with respect to others focusing on a given genomic region of interest. Therefore our approach does not assume a random distribution of the integrations along the genome, rather takes into account the average gene integration frequency observed within the selected genomic region, the variance of integration frequency and the number of observations (targeted genes). This approach reflects our previous “regional” Grubbs test for outliers to identify CIS in a genome-wide fashion. Although the statistical principles and the procedures for CIS identification by genome-wide Grubbs test for outliers’ analysis are essentially the same as the previous one, there are also some important differences. Here we compare the integration frequency of each targeted gene to the average frequency of all genes of the dataset without correcting for regional biases. This allows ranking genes by their relative integration frequency with respect to the whole genome and better highlighting the most highly targeted regions (Fig. S19A). Moreover, also the number of genes analyzed is very different, as in regional analyses the genes may range between 7 and 80, while in our genome wide approach they can be from 500 to thousands. The latter is an important difference as the number of genes analyzed impacts on the significance because the p-value obtained is corrected (multiplied) by the number of genes analyzed. Given the large number of genes evaluated in genome wide analyses, this correction greatly increases the p-values and leads to loss of significance. To avoid the high toll imposed by this type of p-value correction we opted for a multi-step p-value correction. Firstly, we generated a list of CIS genes with non-corrected p-values <0.05 (raw p-value). Then, to reduce the false discovery rate, we corrected the raw p-value by the number of significantly overtargeted genes before correction (raw p-value x Number of genes with a raw p-value <0.05). Finally, the genomic regions comprising the CIS genes

Page 9: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

9

selected by this more stringent correction were used for regional Grubbs analysis to correct for local biases of integration. Therefore, if CIS genes are embedded in a region in which all genes are targeted at high frequency, they will not be considered significantly overtargeted by the regional Grubbs test for outliers. In order to empirically evaluate and validate our method and the applied p-value corrections to stringently distinguish genotoxic CIS from local biases of integration, we analyzed previously published vector integration datasets, in which known CIS originated by genetic selection are present. Moreover, we addressed whether the CIS identified by our method matched those identified with the more canonical statistical approach (10). We analyzed datasets from the γRV–based trials for X-linked Severe Combined Immunodeficiency (X-SCID) (5, 6, 15, 16), Chronic Granulomatous Disease (CGD) (3) and Wiskott Aldrich Syndrome (WAS) (17), the LV-based trial for Adrenoleukodystrophy (ALD) (11), and the new LV-based trial for MLD object of this study. The analysis was done by pooling all vector integrations from all patients in each trial and, for the MLD clinical trial, the LV CIS were also analyzed in each single patient independently. The number of patients, integrations and targeted genes from the clinical trial integration datasets analyzed are reported Table S16A. Correction of CIS significance and CIS identification by the genome-wide Grubbs test for outliers To reduce the background of high integration frequency values originating by small genes, an additional 100 Kb distance was added to each gene interval. Thus, the Z-score, t-studentization and a raw p-value are calculated considering a minimal gene size of at least 100 Kb. Following the sequential p-value correction approach described above we first generated gene lists whose integration frequency provided a raw p-value <0.05 (significant). The number of significant p-values was subsequently used to multiply the raw p-value of each gene and re-select for corrected p-values <0.05. Before p-value correction, the number of significant CIS retrieved in each clinical trial correlated directly to the total number of targeted genes (R2=0.996). For datasets with <600 targeted genes (X-SCID and CGD) the significant CIS genes constituted on average the 1% of the entire dataset, while for datasets of >1800 targeted genes the CIS genes represented the 4% of the total. After p-value correction, the number of CIS is reduced to the 0.2-0.5% of the total and the direct influence of the total number of targeted genes on the number of CIS identified vanishes (Fig. S19B and Table S16D). Thus, this correction allows performing a more stringent selection for the most overtargeted genes. The genome-wide Grubbs test for outliers applied to the previously reported ALD study and the present MLD integration datasets identified several significantly overtargeted LV CIS genes. Several of the CIS genes identified were shared between both trials. Given the larger size of the MLD dataset, it contained almost the entire ALD CIS dataset (Fig. S19C,D). After p-value correction, the number of CIS was reduced to 7-9 for each clinical trial. The most targeted CIS genes in both the ALD and MLD clinical trials were KDM2A, TNRC6C, PACS1, C6orf10 and MUM1 (Table S16E). We then assessed whether the CIS genes were significantly overtargeted when considering only the surrounding megabase-wide genomic region by the region-based Grubbs test. This analysis is required as the average gene integration frequency at the whole genome level does not take into account the biases of LV integration towards specific megabase-wide genomic regions. The gene integration frequency at specific chromosomal regions was visualized by plotting the p-values of gene integration

Page 10: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

10

frequencies along the gene chromosomal coordinates. Local clusters of CIS within megabase-wide LV integration hotspots were evidenced by plotting CIS along the chromosomes. Given the skewed integration profile, 5 chromosomes contained ∼60% of the LV CIS (chromosomes 1, 6, 11, 17, 19) (Fig. S19E). All CIS from the MLD clinical trial were not considered significantly overtargeted with respect the neighboring genes by the region-based Grubbs test, except for OPTC. IS targeting OPTC were not detected at the last follow-up time in all three patients (Fig. S19F). We then compared the CIS genes identified by the two methods (Abel’s and Grubbs). The comparison shows that the CIS identified by our method are a subset fully contained within the Abel’s CIS dataset (Fig. S19G).

CIS identification in γγRV-based clinical trials by the genome-wide Grubbs test for outliers The genome-wide Grubbs test for outliers clearly identified, before and after p-value correction, CIS genes in γRV-based clinical trials such LMO2, MECOM, PRDM16, and CCND2 together with other genes such as ZNF217, TOMM20, KLF6, MIR17HG, MN1, all being significantly overtargeted (Fig. S19H and Table S16F). This test however did not identify SETBP1 or RUNX1 as significant CIS, which were instead significant in previous analyses. Most CIS from these datasets were also previously analyzed by the regional Grubbs test for outliers (2). CIS significance evaluation considering the gene integration frequencies at the surrounding genomic intervals confirmed that well known culprits of oncogenesis/clonal-dominance such as LMO2, CCND2, PRDM16, SETBP1, RUNX1 and MECOM were indeed significantly overtargeted in the X-SCID and CGD clinical trials. This finding confirms that our genome-wide Grubbs test for outliers is able to detect genotoxic CIS (Table S16F).

Integration sites filtering based on sequence count As previously described, each IS is associated to a sequence count value corresponding to the number of reads that map into the relative locus. These parameters are important for the validation of ISs (as in the collision detection process) but also for other biological data mining based on ISs. Therefore, in order to generate adequate customized filters for further analysis we performed a detailed study of sequence counts properties in our datasets. To analyze the distribution of sequence counts values we plotted ISs data cleaned from collisions on a boxplot representation. As an example, box plots of ISs identified in CD34+ cells from each patient over time are shown in Fig. S20A in which sequence counts data are represented in log10 scale. Several boxes on Fig. S20A were flattened on the X-axis value 1 suggesting that a relevant portion of integration sites carried only 1 corresponding sequence. Some biological analyses require stringent parameters for ISs validation to minimize sequencing or PCR artifacts. Therefore, for this set of analyses we applied a threshold for the minimum number of reads for each integration site with the goal to minimize the number of false positives. We tested the threshold of sequence count 3 to exclude the ISs represented by very low counts, which carry a higher probability of being affected by potential sequencing errors (Fig. S20B). In another representation based on Gaussian kernel density distributions, IS sequence counts are shown before and after filtering for the threshold of 3 per IS for CD34+ over time points (Fig. S20C). Table S17 summarizes the number of ISs for each patient over filtering by collisions and sequence count.

Page 11: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

11

We applied this filtering procedure to all datasets undergoing analysis of stem cell marking and, after data normalization, of clonal abundance.

• In stem cell marking analysis, ISs with a sequence count lower than the threshold of 3, would have affected the analytical stringency, leading to potential overestimation of the number of transduced stem cells.

• In abundance analysis, we estimated that preservation of ISs with a sequence count lower than the threshold of 3 would have had a negative impact on background noise upon data normalization.

Stem cell marking analysis Stem cells marking analysis was performed in order to retrieve a consistent group of ISs providing strong evidence of transduction and engraftment of multipotent progenitors. This information is relevant both for the efficacy of gene therapy and for general studies of hematopoietic cells biology after transplant. For this analysis we compared the following IS data sets from the indicated purified cell fractions taken as best representative of each hematopoietic subset/lineage: CD34+ cells purified from the BM as representative of the progenitors; purified PB CD3+ cells (T cells) and purified PB CD19+ cells (B cells), pooled to represent the lymphoid lineage; purified PB CD14+ cells (granulocytes) and PB CD15+cells (monocytes), pooled to represent the myeloid lineage. The presence of shared IS between CD34+ cells, myeloid and lymphoid lineage cells of the same patients was first assessed by comparing ISs after filtering only for collisions (Fig. S21A). To increase analytical stringency, we applied additional filtering levels. The first filter was based on sequence counts (≥3, as discussed above) and the results when applied to MLD01 dataset are shown in Fig. S21B. We then applied a second filter taking into account the potential cross-contaminations among different samples from the same patient. Firstly, as discussed above in the Collision detection section, we estimated that background contamination between samples averages 1.6% in our experimental setup. Thus, any overlap of IS retrieved from different samples must well exceed this level for any further inference. Indeed, we observed much greater overlaps between the IS datasets retrieved from the different hematopoietic subtype/lineages of the same patient. When interpreting these data, however, we must consider the purity of each cell fraction analyzed, which is dependent on the cell source, type and experimental methods used to purify that cell fraction. Based on empirical assessment, we considered that the extent of contamination by cells of any given different type/lineage in each purified cell fraction analyzed is likely to be less than 10%. Thus, we used the same filter previously designed to evaluate shared IS among different Patients. Here, IS showing a sequence count 10 fold higher in a given lineage as compared to the others were assigned to the former lineage and discarded from the others. The total number of ISs remains the same as after the sequence count filter but the ISs are more stringently distributed among the lineages. Because the CD34+ cells fractions used for this analysis were only harvested from the BM and consistently purified to a purity approaching 90%, they are the cell population least likely to be contaminated by more than 10% of cells belonging to any given lineage. Moreover, they are the only fraction to comprise the progenitor cells. Thus, we based any further inference on stem cell marking only considering the IS shared between the CD34+ cell dataset and both the lymphoid and myeloid datasets. A list of the ISs putatively representing stem/multipotent progenitor cells according to these criteria is available upon request.

Page 12: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

12

Clonality and Diversity analysis The analysis of clonal diversity addresses the issue of IS diversification in the different lineage compartments. The Shannon information content index provides a measurement of entropy and diversity of our IS dataset, as shown in a previous work(9) by using the formula

Where pi is the proportion of sequence counts belonging to the ith type of IS in a given cell lineage. Results of this analysis in each patient are shown in Fig. 6D of main manuscript. Overall, CD34+ and myeloid cells after an initial drop in complexity at month 1 after transplant remain stable. In lymphoid (B and T) cells, the complexity is low and gradually increases overtime. The results for the all the patients are shown in Fig. S22.

Stem cell clones population size estimation To estimate the population size of transduced active HSC, we exploited patient MLD01 and MLD02 data and selected the short- lived myeloid lineage cells from peripheral blood long term after HSC-GT as readout of stem cell output. Thus the IS from long-term PB myeloid cell datasets selected for this analysis were filtered by collisions and by sequence count. Note that by IS retrieval we estimated the number of HSC “clones” and not the total number of HSCs. Indeed, no consideration on clonal abundance was made in this calculation. Therefore, the actual number of “HSCs” belonging to a single clone could well vary according to birth/death/migration dynamics but the composition and repertoire of all “HSC clones” in our patients should remain constant. The estimate was performed through the mark-and-recapture approach (9, 18) using the Schnabel method, a modified version of the Petersen model that is summarized in the following formula:

in which N is the estimate of the population size, Ci is the number of captured elements at i-th time, and S is the number of shared elements between the two captures. The Schnabel model requires:

• Closed population, that means no births and no deaths/migrations of elements; • Equal probability of capturing among elements; • Independent sampling.

To best fit the Schnabel model requirements to HSC clones’ population, we selected time points from 9 months on, representing a stable condition, and we compared the two latest time points, 12 months versus 18 months.

H ' = − pi log pii=1

R

N =(C1 +1)(C2 +1)

S +1−1

Page 13: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

13

Sampling ISs from these time points, we obtained the following counts for patient MLD01:

Sampling Model Var Captured ISs

12 C1 621

18 C2 644

shared S 107 Thus the Schnabel estimator is:

For patient MLD02 we obtained the following results:

Sampling Model Var Captured ISs

12 C1 431

18 C2 598

shared S 119 And the relative Schnabel method returned:

We also assessed whether using the cutoff of sequence count <3 would have affected the estimate of the HSC clones’ number (over-estimating) by repeating the analysis without using this filter. As an example, we reported here results for patient MLD01 without using sequence count filter, obtaining an estimate of 5,100 HSC. This result demonstrated that sequence count filtering did not drive the PS model to over-estimate the HSC population abundance.

Sampling Model Var Captured ISs

12 C1 864

18 C2 1208

shared S 204

To overcome and balance possible model biases due to requirements for equal probability of capturing among elements, we also tested these findings exploiting a mark-recapture model that can assess and take into account probability heterogeneity of capture. Using log-linear regression models (18) through the R package Rcapture (http://cran.r-project.org/web/packages/Rcapture), we compared 3 available

N =(C1 +1)(C2 +1)

S +1−1= 3713

N =(C1 +1)(C2 +1)

S +1−1= 2155

N =(C1 +1)(C2 +1)

S +1−1= 5100

Page 14: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

14

samplings (time points 9, 12 and 18 months) and obtained higher results but still within the same order of magnitude both for patient MLD01 and MLD02. We selected Chao model and tested it by fitting the log-linear model and by computing a confidence interval (CI) for the abundance estimation through the multinomial profile likelihood CI (alpha = 0.05). Table S19 reports a summary of the results. In conclusion, by the analysis of repeated samplings of the myeloid lineage long term post-infusion we estimated that 2- 5x103 HSC are likely to be functionally engrafted in patients MLD01 and MLD02 18 months after HSC-GT. Considering that we infused approx. 108 CD34+ cells and that it was reported that ≤0.01% of human CD34+ cells (i.e. 1x104) have the properties of long-term repopulating cells in SCID mice(19), our estimate would imply that a large fraction of harvested HSCs were transduced and long-term engrafted the patients. Our results also validate the myeloablative potential of the conditioning regimen used, as only HSC-GT allows infusing genetically marked but immunological identical cells, thus ruling out any involvement of immune reactions in establishing a high myeloid chimerism.

Page 15: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

16

SUPPORTING TABLES GMP batches

08087 09015 09021 Initial titer (TU/ml) 1.0x107 1.2x107 1.4x107

Initial physical particles (ng p24/ml) 3.6x102 2.4x102 2.2x102

Initial infectivity (TU/ng p24) 3.0x104 5.0x104 6.5x104

Initial volume (Litres) 25 25 25

Final titer (TU/ml) 6.4x108 4.0x108 2.7x108

Final physical particles (ng/ml) 1.1x104 9.4 x103 8.2 x103

Final infectivity (TU/ng p24) 5.6x104 4.2x104 3.3x104

Final volume (ml) 180 180 180

Overall yield (TU) 22% 24% 14%

Overall yield (p24) 29% 28% 27%

Total TU produced 1.1x1011 7.2x1010 4.9x1010

Table S1. Downstream Process results obtained for the production of the GMP ARSA LV batches; batches #08078 and 09021 were used for treatment of the patients MLD01, 02 and 03. TU: transducing units.

Page 16: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

17

Test Specification Batch #

08087 09015 09021 Physicochemical and identity Osmolality (mOsm/Kg) 290-350 315 306 313 pH EP 2.2.3 7.0-8.0 7.6 7.7 7.6 ARSA transgene sequence Corresponding Corresponding Corresponding Corresponding Vector integrity Corresponding to

the reference Corresponding Corresponding Corresponding

Lentiviral proteins Corresponding to the reference

Corresponding Corresponding Corresponding

Potency and bioactivity Infectious Titer (TU/ml) ≥ 2x108 6.4x108 4.0x108 2.7x108 Physical titer (HIV Gag p24 Antigen) (ng/ml)

FIO 1.1 x104 1.0x103 8.2x103

Infectivity (Transducing unit/ng p24)

≥ 2x104 5.6x104 3.8x104 3.3x104

Transgene function (ARSA activity, fold to untransduced)

≥ 5 fold untransduced cells

22 14 16

Microbial purity and safety Sterility EP 2.6.1 Negative Negative Negative Negative Mycoplasma EP 2.6.7 (cultural assay)

Negative Negative Negative Negative

Endotoxin EP 2.6.14 (quantitative assay) (EU/ 2x108 TU)

≤ 25 3 21 7

In vitro Adventitious viruses Negative Negative Negative Negative In vivo Adventitious viruses Negative Negative Negative Negative RCL Negative Negative Negative Negative Process and product impurities Host cell proteins (ng/ 2x108 TU)

FIO 22 36 44

Plasmid residual DNA (VSV-G) (copies/2x108 TU)

≤ 4x108 0.6x108 1.6x108 1.9x108

Large T antigen (protein contamination) (ng/ml)

≤LOQ (*) ≤LOQ ≤LOQ ≤LOQ

Large T antigen Residual DNA (copies/2x108 TU)

≤ 2.0 x 105 0.7x104 2.3x104 1.5x104

Benzonase contamination (ng/ml)

≤ 0.2 < 0.1 <0.1 <0.1

E1A DNA (copies/2x108 TU) ≤ 2.0 x 105 1.7x104 3.4x104 4.3x104 Total residual DNA (µg/2x108 TU)

FIO 0.9 1.9 1.5

BSA contamination (µg/2x108 TU)

FIO 0.4 0.7 0.8

Vector cross-contamination ≤ 105pp / 1010pp ≤ 105pp / 1010pp ≤ 105pp / 1010pp ≤ 105pp / 1010pp Table S2. Characterization of the ARSA LVV batches produced and validated; batches #08078 and 09021 were used for treatment of the patients MLD01, 02 and 03. FIO: for information only; RCL: replication competent lentivirus; LOQ: limit of quantification.

Page 17: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

18

MLD01 MLD02 MLD03 Leukocyte ARSA activity (nmol/mg/h)(before HSC-GT)

12 7.3 4.7

ARSA gene mutations c.821C>TLI(20)

(p.Thr274Met) c.821C>TLI (p.Thr274Met)

c.730C>TLI(21) (p.Arg244Cys) c.731G>ALI(21) (p.Arg244His)

c.443C>GUK (p.Pro148Arg) c.443C>GUK (p.Pro148Arg)

Age at expected onset (onset in the affected sibling/s)

18 months 24 months 15 months

Age at HSC-GT 16 months 13 months 7 months Age at last follow up 39 months 30 months 25 months Symptoms at HSC-GT no no no NCV index at HSC-GT -11.5 -2.3 -6.7 Table S3. Treated patients’ characteristics. HSC-GT: HSC gene therapy. LI: late infantile-associated mutation; UK: unknown/not previously described mutation.

Page 18: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

19

Specification Available at

infusion Drug Substance Mycoplasma PCR (EofT) Negative yes LV Copy number (LC) FIO no ARSA Transgene product expression (LC) FIO no RCL (EofT) Negative no Large T antigen DNA (EofT) and (LC)

EofT: FIO LC: ≤ LLOQ

yes (LC)

Endotoxin (EofT) ≤ 2.5 EU /ml yes Clonogenic test (EofT) FIO no Transduction efficiency (EofT) FIO no

E1A DNA (EofT) and (LC) EofT : FIO LC: ≤ LLOQ

no

IF (CD34, CD45, CD19, CD3, CD15) FIO yes Drug Product Sterility – microbiological control of cellular products Negative yes

Cell viability >80% yes Table S4. Tests and specifications for Drug Substance and Drug Product quality control. EofT: end of transduction; LC: 14 days liquid culture; FIO: for information only; RCL: replication competent lentivirus; LLOQ: lower limit of quantification; IF: Immunophenotyping.

Page 19: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

20

MLD01 MLD02 MLD03 Cell dose (CD34+ cells/kg) 11x106 7.0x106 7.2x106 VCN (copies/genome) 2.5 2.5 4.4 Transduction efficiency (%) 97 90 93 ARSA activity (fold to HD)

>10 >10 >10

BU total dose (mg/kg) 10.4 14.6 10.4 Neutropenia (days post-GT) +9 to +38 +9 to +45 +11 to +37 Table S5. Transplant details. VCN: vector copy number, measured after 14 days of culture; HD: healthy donor; BU: busulfan. Transduction efficiency was measured by quantitative PCR performed on individual colonies obtained from colony forming cell – CFC – assay.

Page 20: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

21

Cell type Source # nuclei # ARSA+ cells # WPRE+ cells CD13+ BM 73 72 72 PBMCs PB 43 39 39 CD14+ PB 22 21 22 CD3+ PB 51 24 27 Table S6. Quantitative results of in situ hybridization and ARSA detection in cells isolated from patient MLD01 1 year after HSC-GT: the number (#) of ARSA+ and WPRE+ cells and the total number of nuclei counted/slide are reported. Nuclei detection was based on DAPI staining. ≥ 3 slides/cell type were analyzed. BM: bone marrow; PB: peripheral blood.

Page 21: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

22

Patients MLD01 01 sib1 01 sib2 MLD02 02 sib MLD03 03 sib Levels 0 18 (up to 31) 18 (up to 25) 1 24 15 2 18 (up to 39) 32 19 3 20 4 40 5 18 18 22 6 30 30 44 24

Table S7. GMFC-MLD data. Age (in months) at entrance in each stage of the GMFC-MLD scale(22) is reported for the treated patients and their matched siblings (sib) (GMFC-MLD can be applied only starting from 18 months of chronological age).

Page 22: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

GMFM dimensions TOT Patients Timing Lying

Rolling Sitting

Crawling Kneeling

Standing Walking Running Jumping

MLD01 Baseline 100% 83% 74% 51% 17% 65% Last f-u 100% 100% 88% 66% 26% 76%

MLD02 Baseline 100% 92% 88% 69% 29% 76% Last f-u 100% 99% 88% 79% 69% 87%

MLD03 Baseline 78% 48% 5% 5% 0% 27% Last f-u 100% 95% 85% 79% 58% 83%

Table S8. GMFM data. Results from GMFM testing are reported as % of the total score for each dimension of motor function (lying and rolling; sitting; crawling and kneeling; standing; walking, running and jumping) and as average % (TOT) of all the tested dimensions for the three patients at time of treatment (baseline; 16, 13 and 7 months of chronological age for patients MLD01, MLD02 and MLD03, respectively) and at the last reported follow up (last f-u; 39, 31 and 25 months of chronological age for patients MLD01, MLD02 and MLD03, respectively). The performance of the three patients at all the reported time points fits within the range described for typically developing preschool children(23).

Page 23: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

24

Cognitive

composite scoreConfidence

interval 95% Language

composite scoreConfidence

interval 95% Patient TimingMLD01 Baseline 95 87-103 N.A. N.A.

Last f-u 90 83-99 90 93-107 MLD02 Baseline 100 92-108 110 104-114

Last f-u 100 87-103 91 84-99 MLD03 Baseline 115 106-122 83 77-91

Last f-u 95 92-108 91 84-99 Table S9. Bayley Scale of Infant and Toddler Development data. Cognitive and language IQ composite scores for chronological age are reported for each treated patient at baseline and at last reported follow up (last f-u) after gene therapy. N.A: not applicable.

Page 24: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

25

Patient Number of LAM PCRs

Raw reads

Trimmed reads

Discarded reads

Reads of ISs

Identified ISs

MLD01 298 4,023,403 3,002,770 1,020,633 1,120,414 15,011

MLD02 321 4,865,367 3,843,138 1,022,229 1,518,802 11,614

MLD03 224 1,712,725 1,215,812 496,913 425,356 11,470

Table S10. Summary of the results of quality analysis process and mapped IS. For each patient we processed several LAM PCRs that were sequenced obtaining a given number of raw reads. Our bioinformatics pipeline processed the raw reads obtaining a given number of trimmed reads. Discarded reads were those without a recognizable LTR sequence. A fraction of the trimmed reads was mapped on the human genome (Reads of ISs) resulting in a given number of ISs (identified ISs).

Page 25: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

26

Purity after sorting (% marker+ cells)

PB CD15+ cells 86,73 ± 7,74 PB CD56+ cells 71,25 ± 14,6 PB CD14+ cells 92,25 ± 7,73 PB CD19+ cells 81,24 ± 9,2 PB CD3+ cells 85,19 ± 19,54

BM CD15+ cells 87,2 ± 8,03 BM CD34+ cells 86,13 ± 2,67 BM CD13+ cells 76,29 ± 14,39

BM GLY A+ cells 72,36 ± 11,74 BM CD19+ cells 54,90 ± 21,71 BM CD3+ cells 69,53 ±16,37

Table S11. Purity of subpopulations isolated from peripheral blood (PB) and bone marrow (BM). GLY A: glycophorin A.

Page 26: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

27

A Timepoint (months) Patient 1

(MLD01) Patient 2 (MLD02)

Patient 3 (MLD03)

in vitro 0 (transduction) BM CD34 BM CD34 BM CD34 in vivo 1 BM; PB BM; PB BM; PB

3 BM; PB BM; PB BM; PB 6 PB BM; PB BM; PB 9 PB PB PB

12 BM; PB BM; PB BM; PB 18 BM; PB BM; PB

B

Sample Cell Type Marker

in vitro BM In vitro cultured CD34+

Colonies CFC*-CD34+

in vivo

PB

Peripheral Blood Whole PB,PBMC§ lymphoid T CD3+

myeloid CD14+,CD15+ lymphoid B CD19+

NK CD56+

Colonies CFC*-PBMC§,CFC*-PB lineage (-) fraction

BM

Whole BM Whole BM,BMMC§

lymphoid T CD3+ myeloid CD13+,CD15+

lymphoid B CD19+

CD34 CD34+,CFC*-CD34+

Erythroid GLY A+

Colonies CFC*-BMMC§, Whole BM

Table S12. Summary of the samples analyzed at each time point (A) and of the specific cell types analyzed for vector integration studies (B). Genomic DNA extracted from whole BM and PB or FACS-sorted cell sub-types from these tissues, harvested from patients at different time points. * CFC: Colony Forming Cells; § PBMC and BMMC: PB-derived Mononuclear Cells and BM-derived Mononuclear Cells.

Page 27: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

28

A

Patient ISs Collisions

Inter-Trials Removed ISs ISs remaining after collisions

MLD + WAS MLD MLD01 15,011 529 296 14,482

MLD02 11,614 537 223 11,077

MLD03 11,470 511 149 10,959

B IS id Patient

A Patient

B Ratio A/B

Log10 A/B

Removed from A

Removed from B

Assigned to A

Assigned to B

1 1 100 0.01 -2.00 X X 2 20 100 0.20 -0.70 X X 3 100 100 1.00 0.00 X X 4 150 100 1.50 0.18 X X 5 250 100 2.50 0.40 X X 6 1100 100 10.01 1.00 X X

C

St.s Pt.

s

Unique ISs Removed

from Study

ISs Removed

from study

Removed ISs seq. Count

MLD WAS Assigned to pt. Assigned to pt.

1 2 3 1 2 3

MLD 1

607 296 5,148 - 78 9 129 3 14

2 223 6,788 145 - 15 13 11 130 3 149 3,964 68 258 - 10 10 16

WAS 1

466 174 9,078 170 4 9 - 29 7

2 151 4,794 11 13 9 17 - 68

3 192 5,074 9 6 4 3 25 - Table S13. Summary of collision filtering. (A) Patient’s collisions with respect to other patients (WAS and MLD). (B) Theoretical scenarios for removal and reassignation of integrations sites in two patients. (C) Integration sites and sequence reads removed from and reassigned to each patient form the MLD and WAS clinical trials.

Page 28: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

29

GO System Term Name

Binom FDR Q-

Val

Binom Fold

Enrich

Binom Region

Hits

Hyper FDR Q-

Val

Hyper Fold

Enrich

Hyper Gene Hits

Hyper Total Genes

GO

Bio

logi

cal P

roce

ss

intracellular transport 3.18E-14 2.1 159 2.59E-04 1.54 123 814 response to DNA damage stimulus 2.52E-09 2.1 106 4.74E-03 1.55 84 551

DNA repair 5.20E-09 2.3 80 4.05E-03 1.69 62 372

chromatin modification 1.17E-08 2.0 108 3.15E-03 1.67 68 414

histone modification 3.99E-07 2.3 66 5.40E-03 1.86 44 240

interspecies interaction between organisms 6.15E-07 2.1 80 1.41E-02 1.62 58 363

covalent chromatin modification 6.01E-07 2.3 66 6.68E-03 1.84 44 243

positive regulation of viral reproduction 2.91E-06 4.1 24 1.51E-02 2.54 18 72

regulation of viral transcription 4.55E-06 4.5 21 2.24E-02 2.58 16 63

positive regulation of viral transcription 6.07E-06 4.9 19 3.82E-02 2.59 14 55

ribonucleotide biosynthetic process 3.75E-05 3.2 27 2.86E-02 2.23 20 91

regulation of viral reproduction 1.63E-04 3.1 25 4.33E-02 2.20 19 88

DNA replication 2.28E-04 2.2 44 2.87E-02 1.80 35 198

axon cargo transport 3.66E-04 6.0 11 2.83E-02 4.45 7 16

regulation of translation 6.97E-04 2.2 41 3.31E-02 1.83 32 178

ATP metabolic process 8.77E-04 2.2 39 2.79E-02 1.82 34 190 microtubule-based

transport 1.17E-03 3.9 15 2.57E-02 3.20 11 35

ATP catabolic process 1.80E-03 2.3 32 4.55E-02 1.89 27 145 regulation of histone

deacetylation 3.23E-03 6.4 8 2.19E-02 5.55 6 11

positive regulation of histone deacetylation 4.41E-03 7.2 7 2.68E-02 6.36 5 8

regulation of protein deacetylation 4.41E-03 6.1 8 4.52E-02 4.69 6 13

DNA-dependent DNA replication 4.59E-03 2.7 20 1.67E-02 2.51 18 73

regulation of lymphocyte mediated immunity 4.67E-03 2.7 21 3.00E-02 2.40 17 72

mononuclear cell proliferation 8.92E-03 2.5 20 1.65E-02 2.91 14 49

leukocyte proliferation 9.37E-03 2.5 21 9.15E-03 2.99 15 51

lymphocyte proliferation 1.80E-02 2.4 19 2.87E-02 2.81 13 47 DNA synthesis involved

in DNA repair 2.27E-02 5.0 7 3.77E-03 7.63 6 8

positive regulation of reproductive process 2.43E-02 2.0 27 2.17E-02 2.31 20 88

DNA biosynthetic process 2.69E-02 4.2 8 6.10E-03 5.93 7 12

GO

Cel

lula

r C

ompo

nent

nuclear body 7.05E-07 2.3 56 6.53E-04 1.88 47 254

centrosome 1.11E-06 2.1 70 1.20E-03 1.73 56 329

nuclear speck 1.54E-05 2.7 35 5.26E-04 2.25 31 140

AP-2 adaptor complex 9.55E-05 8.0 9 2.10E-03 6.78 6 9 clathrin coat of endocytic

vesicle 2.07E-04 7.2 9 4.39E-03 6.10 6 10

clathrin coat of coated pit 3.70E-04 6.7 9 3.68E-02 4.36 6 14

spindle microtubule 2.78E-03 3.2 15 1.60E-02 2.91 12 42

Page 29: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

30

platelet alpha granule membrane 1.08E-02 7.5 5 3.43E-02 6.78 4 6

clathrin-coated endocytic vesicle membrane 2.56E-02 3.1 10 5.20E-03 5.08 7 14

WINAC complex 3.70E-02 5.3 5 3.43E-02 6.78 4 6

Table S14A-F. Gene ontology analysis of in vitro and in vivo IS datasets in MLD Patients. The analyses were performed with GREAT software. The columns form left to right contain: GO System: gene classes belonging to Molecular process, Biological process and Cellular component classification systems are indicated; Term Name: name of the overrepresented gene class; Binom FDR Q-Val; False discovery rate value of the binomial distribution statistical analysis for the given gene class; Binom Fold Enrichment: enrichment with respect the expected upon binomial distribution statistical analysis; Binom Region Hits: genomic regions identified by binomial distribution statistical analysis; Hyper FDR Q-Val: False discovery rate of the significantly overrepresented gene classes by the hypergeometric statistical analysis; Hyper Fold Enrichment: fold enrichment of the gene class with respect the expected; Hyper Gene Hits: genes targeted by LV ISs in the context of the hypergeometric statistical analysis; Hyper Total Genes: total genes of the given gene class. Table S14A. Gene Ontology classes significantly targeted by LV IS in MLD01 in vitro cultured CD34+ cells.

Page 30: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

31

GO System Term Name

Binom FDR Q-

Val

Binom Fold

Enrich

Binom Region

Hits

Hyper FDR Q-

Val

Hyper Fold

Enrich

Hyper Gene Hits

Hyper Total Genes

GO

Bio

logi

cal P

roce

ss

response to DNA damage stimulus 1.6E-65 2.0 729 4.2E-06 1.3 245 551 DNA repair 4.3E-54 2.2 529 1.3E-04 1.3 168 372 histone modification 1.2E-43 2.1 435 1.0E-04 1.4 116 240 covalent chromatin modification 1.9E-43 2.1 438 6.6E-05 1.4 118 243 T cell costimulation 9.6E-35 3.5 147 4.5E-02 1.5 39 77 cytokine-mediated signaling pathway 1.4E-27 2.1 277 4.9E-02 1.3 99 230 histone lysine methylation 4.3E-24 3.2 117 1.7E-02 1.8 26 44 microtubule-based transport 9.3E-24 3.6 97 3.9E-02 1.8 21 35 nucleocytoplasmic transport 3.2E-23 2.0 273 1.1E-02 1.4 90 197 G2/M transition checkpoint 1.2E-22 3.9 85 1.5E-02 1.9 22 35 G2/M transition DNA damage checkpoint 3.4E-21 4.1 74 2.9E-02 1.9 18 28

spindle organization 3.4E-17 2.8 104 4.5E-03 1.7 37 65 natural killer cell differentiation 1.4E-16 5.1 46 4.6E-02 3.0 6 6 interferon-gamma-mediated signaling pathway 8.3E-16 2.6 103 3.6E-02 1.6 33 62

histone methylation 1.2E-14 2.3 125 4.9E-02 1.6 30 56 nucleus localization 3.1E-11 3.5 47 2.9E-02 2.2 12 16 regulation of TOR signaling cascade 3.8E-09 2.7 57 1.7E-02 2.0 18 27 regulation of histone deacetylation 1.8E-06 3.3 28 6.4E-03 2.7 10 11 regulation of protein deacetylation 1.8E-06 3.2 29 9.9E-03 2.5 11 13 positive regulation of histone deacetylation 3.2E-05 3.3 22 7.3E-03 3.0 8 8

positive regulation of protein deacetylation 3.3E-05 3.2 23 1.5E-02 2.7 9 10

citrate metabolic process 1.2E-03 2.2 29 1.8E-02 3.0 7 7

GO

Mol

ecul

ar F

unct

ion

histone-lysine N-methyltransferase activity 6.9E-29 3.8 114 5.2E-03 1.9 25 39

protein-lysine N-methyltransferase activity 2.1E-28 3.6 117 1.4E-03 2.0 27 41

protein N-terminus binding 5.0E-23 2.5 163 9.5E-03 1.6 45 85 histone methyltransferase activity 3.0E-19 2.7 118 1.8E-02 1.7 28 48 DNA helicase activity 3.3E-19 3.3 89 2.1E-02 1.8 26 44 N-methyltransferase activity 3.7E-18 2.5 133 3.0E-02 1.6 34 63 protein methyltransferase activity 1.4E-16 2.3 136 1.3E-02 1.7 36 65 small protein activating enzyme activity 3.4E-15 8.4 28 1.8E-02 2.7 9 10

androgen receptor binding 2.4E-12 2.4 93 3.0E-02 1.8 22 36 ATP-dependent DNA helicase activity 2.2E-11 2.8 65 3.1E-02 1.8 21 34 DNA-dependent ATPase activity 6.7E-11 2.1 117 1.6E-02 1.6 40 75

GO

Cel

lula

r C

ompo

nent

MHC class II protein complex 4.5E-65 19.4 74 3.2E-02 2.2 11 15 condensed nuclear chromosome 1.2E-24 3.4 106 1.1E-03 1.8 35 59 nuclear speck 7.3E-17 2.0 187 1.2E-05 1.6 76 140 histone deacetylase complex 1.9E-11 2.0 125 1.6E-04 2.0 31 47 immunological synapse 2.4E-08 3.3 36 3.2E-02 2.2 11 15

Table S14B. Gene Ontology classes significantly targeted by LV IS in MLD01 in vivo.

Page 31: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

32

GO System Term Name Binom

FDR Q-Val

Binom Fold

Enrich

Binom Region Hits

Hyper FDR Q-

Val

Hyper

Fold Enric

h

Hyper

Gene Hits

Hyper

Total Gene

s

GO Biological

Process

alternative nuclear mRNA splicing. via spliceosome 1.29E-06 28.5 8 4.45E-02 11.95 4 6

cellular component assembly involved in

morphogenesis 9.19E-03 2.8 20 4.68E-02 2.81 16 102

GO Cellular Component

nuclear chromatin 1.92E-02 2.4 19 1.99E-02 2.65 17 115

nuclear chromosome part 2.03E-02 2.0 27 5.14E-02 2.04 24 211

Table S14C. Gene Ontology classes significantly targeted by LV IS in MLD02 in vitro cultured CD34+ cells.

Page 32: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

33

GO System Term Name

Binom FDR Q-

Val

Binom Fold

Enrich

Binom Region

Hits

Hyper FDR Q-

Val

Hyper Fold

Enrich

Hyper Gene Hits

Hyper Total Genes

GO

Bio

logi

cal P

roce

ss

nerve growth factor receptor signaling pathway

1.9E-31 2.1 323 2.6E-06 1.6 101 221

histone modification 4.0E-28 2.1 311 1.5E-04 1.5 101 240 covalent chromatin modification 6.2E-28 2.0 313 1.5E-04 1.5 102 243 mRNA catabolic process 2.5E-25 3.2 124 1.8E-03 1.7 45 94 RNA catabolic process 2.9E-24 2.9 139 2.1E-03 1.6 54 119 nuclear-transcribed mRNA catabolic process

1.4E-23 3.4 106 1.2E-02 1.7 38 82

Ras protein signal transduction 7.9E-22 2.5 165 1.8E-02 1.5 53 126 nucleocytoplasmic transport 1.8E-20 2.1 214 1.9E-02 1.4 77 197 nuclear transport 1.1E-19 2.1 216 1.6E-02 1.4 78 199 Golgi vesicle transport 5.2E-19 2.2 174 2.9E-02 1.4 65 164 histone lysine methylation 8.0E-17 3.1 85 2.1E-02 1.9 23 44 nuclear-transcribed mRNA catabolic process, deadenylation-dependent decay

2.0E-15 3.4 68 1.7E-02 1.8 26 51

peptidyl-lysine modification 9.8E-13 2.1 134 4.1E-02 1.5 47 113 lipid modification 1.1E-12 2.1 136 6.6E-05 1.8 54 107 T cell receptor signaling pathway 1.3E-10 2.0 120 3.3E-03 1.7 42 88 histone methylation 1.6E-10 2.3 92 6.3E-03 1.8 29 56 protein methylation 2.2E-10 2.1 108 3.9E-03 1.7 38 78 natural killer cell differentiation 9.6E-08 4.0 27 1.8E-02 3.6 6 6

GO

Mol

ecul

ar F

unct

ion

MHC class II receptor activity 2.6E-33 23.4 36 4.3E-02 3.1 7 8 protein N-terminus binding 3.7E-20 2.7 129 2.1E-03 1.8 42 85 steroid hormone receptor binding 6.9E-20 2.6 140 3.9E-02 1.7 30 63 histone-lysine N-methyltransferase activity 7.4E-19 3.5 80 3.4E-02 1.9 21 39

protein-lysine N-methyltransferase activity 6.5E-18 3.4 81 2.8E-02 1.9 22 41

androgen receptor binding 3.5E-16 3.0 86 2.9E-03 2.2 22 36 histone methyltransferase activity 1.6E-13 2.7 86 8.5E-03 1.9 26 48 protein methyltransferase activity 3.4E-12 2.3 101 6.7E-03 1.8 33 65 non-membrane spanning protein tyrosine kinase activity

3.8E-10 2.5 73 3.8E-02 1.8 24 47

mitogen-activated protein kinase binding 7.1E-04 2.4 27 2.1E-02 2.5 12 17

GO

Cel

lula

r C

ompo

nent

MHC class II protein complex 1.2E-54 21.1 60 4.7E-02 2.4 10 15 cis-Golgi network 1.7E-09 3.2 44 5.0E-02 2.0 15 27 centromeric heterochromatin 2.7E-07 5.7 17 2.6E-02 2.8 8 10 heterochromatin 1.1E-06 2.1 63 1.8E-02 1.8 26 52 NuRD complex 1.8E-06 3.3 27 4.7E-02 2.4 10 15 immunological synapse 2.2E-06 3.3 27 4.7E-02 2.4 10 15 platelet dense tubular network membrane 2.7E-05 4.0 17 2.3E-02 3.1 7 8

Table S14D. Gene Ontology classes significantly targeted by LV IS in MLD02 in vivo.

Page 33: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

34

GO System Term Name Binom

FDR Q-Val

Binom Fold

Enrich

Binom Region

Hits

Hyper FDR Q-

Val

Hyper Fold

Enrich

Hyper Gene Hits

Hyper Total Genes

GO Biological

Process

cell cycle phase 3.67E-05 2.1 61 1.45E-02 1.77 56 703

interspecies interaction between

organisms 1.73E-04 2.5 40 1.63E-02 2.08 34 363

GO Cellular Component

spliceosomal complex 1.10E-06 4.4 22 4.17E-02 2.47 16 144

SWI/SNF-type complex 6.78E-03 6.3 7 1.64E-02 6.35 6 21

chromatin remodeling complex 1.93E-02 2.6 17 2.83E-02 2.78 14 112

Table S14E. Gene Ontology classes significantly targeted by LV IS in MLD03 in vitro cultured CD34+ cells.

Page 34: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

35

GO System Term Name

Binom FDR Q-

Val

Binom Fold

Enrich

Binom Region

Hits

Hyper FDR Q-

Val

Hyper Fold

Enrich

Hyper Gene Hits

Hyper Total Genes

GO

Bio

logi

cal P

roce

ss

antigen processing and presentation of peptide or polysaccharide antigen via MHC class II

2.3E-66 14.2 91 9.4E-03 2.5 13 19

interspecies interaction between organisms 6.9E-57 2.3 466 1.3E-05 1.5 144 363

gene silencing 1.9E-50 4.6 158 4.5E-02 1.6 31 70 T cell costimulation 8.4E-39 4.3 131 3.2E-02 1.6 34 77 histone modification 2.2E-36 2.2 333 2.0E-03 1.4 94 240 covalent chromatin modification 8.6E-36 2.2 334 2.0E-03 1.4 95 243 nerve growth factor receptor signaling pathway

7.8E-33 2.2 324 4.8E-05 1.6 94 221

cytoskeleton-dependent intracellular transport

1.7E-26 4.3 88 1.8E-03 2.2 23 39

regulation of T cell activation 9.3E-26 2.0 301 1.6E-05 1.6 95 219 microtubule-based transport 1.5E-25 4.3 85 7.2E-04 2.3 22 35 positive regulation of T cell activation 2.7E-25 2.1 252 7.0E-04 1.6 71 166 Golgi vesicle transport 4.7E-24 2.4 186 3.4E-02 1.4 63 164 regulation of translation 7.0E-23 2.2 221 1.2E-02 1.4 70 178 histone lysine methylation 2.2E-22 3.5 95 1.4E-02 1.9 23 44 macromolecule methylation 2.8E-19 2.3 163 4.5E-02 1.5 51 129 protein methylation 1.1E-16 2.4 125 2.3E-03 1.8 38 78 histone methylation 1.3E-16 2.7 107 1.6E-03 2.0 30 56 spindle organization 3.5E-15 2.9 82 1.3E-02 1.8 31 65 immune response-activating cell surface receptor signaling pathway

2.9E-14 2.0 159 3.8E-02 1.5 48 119

organelle transport along microtubule 3.7E-14 4.3 47 1.7E-02 2.4 13 20 antigen receptor-mediated signaling pathway

1.2E-13 2.0 155 3.8E-02 1.5 46 113

peptidyl-serine phosphorylation 1.7E-12 2.3 104 4.8E-02 1.7 27 59 negative regulation of epidermal growth factor receptor signaling pathway

1.9E-12 2.7 79 1.2E-02 2.0 22 41

establishment of organelle localization 1.4E-11 2.2 107 9.6E-03 1.6 44 100 establishment of nucleus localization 1.5E-10 4.6 32 2.6E-02 2.9 8 10 nucleus localization 1.4E-08 3.5 35 2.2E-02 2.5 11 16 androgen receptor signaling pathway 3.5E-08 2.2 72 1.7E-02 1.9 22 42 regulation of alternative nuclear mRNA splicing, via spliceosome

1.6E-05 2.3 43 3.4E-02 2.3 12 19

regulation of mRNA processing 2.1E-05 2.0 56 2.6E-02 1.9 20 38 regulation of nuclear mRNA splicing, via spliceosome

2.6E-05 2.1 51 4.5E-03 2.3 18 29

regulation of histone deacetylation 4.4E-05 3.3 21 1.1E-02 3.0 9 11 positive regulation of histone deacetylation 7.1E-04 3.2 16 2.3E-02 3.2 7 8

intracellular mRNA localization 1.1E-03 4.6 10 4.4E-02 3.7 5 5 protein sumoylation 1.4E-02 2.1 21 4.6E-02 2.5 10 15

GO

Mol

ecul

ar F

unct

ion

MHC class II receptor activity 8.6E-55 33.5 51 2.8E-02 3.2 7 8

nuclear hormone receptor binding 1.1E-25 2.5 183 5.9E-03 1.6 46 103 steroid hormone receptor binding 1.6E-25 2.8 152 8.5E-03 1.8 31 63 protein N-terminus binding 1.1E-24 2.9 138 4.5E-03 1.7 40 85 histone-lysine N-methyltransferase activity 2.3E-24 4.0 89 1.8E-02 2.0 21 39

protein-lysine N-methyltransferase activity 2.3E-24 3.9 92 5.4E-03 2.1 23 41

hormone receptor binding 1.1E-22 2.3 192 1.6E-02 1.5 51 122 androgen receptor binding 4.6E-21 3.4 95 3.5E-04 2.3 23 36 histone methyltransferase activity 2.0E-19 3.1 98 4.4E-03 2.0 26 48 protein methyltransferase activity 6.6E-17 2.6 112 3.7E-04 2.0 35 65

Page 35: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

36

N-methyltransferase activity 5.6E-16 2.6 104 1.9E-02 1.8 30 63 phosphoprotein binding 1.5E-13 2.6 92 3.3E-02 1.8 23 46 polyubiquitin binding 1.5E-10 4.9 30 5.3E-03 2.7 13 18 small conjugating protein binding 1.9E-07 2.1 71 9.1E-04 2.1 29 52 ubiquitin binding 1.2E-05 2.1 57 6.2E-03 2.0 26 49 Rac GTPase activator activity 3.9E-02 2.0 19 3.2E-02 2.8 9 12

GO

Cel

lula

r C

ompo

nent

MHC class II protein complex 8.2E-96 31.7 89 1.1E-03 2.9 12 15 cytoplasmic mRNA processing body 1.5E-24 4.9 70 1.9E-02 1.9 20 38 ribonucleoprotein granule 1.2E-19 3.2 92 2.7E-03 1.8 32 64 spliceosomal complex 6.0E-16 2.2 139 2.9E-02 1.4 56 144 histone methyltransferase complex 1.3E-14 2.9 79 2.7E-02 1.7 27 58 chromosome, telomeric region 6.3E-14 3.2 64 2.9E-02 1.8 21 42 clathrin adaptor complex 1.3E-09 3.0 49 1.3E-02 2.1 18 32 heterochromatin 5.5E-09 2.3 69 4.9E-02 1.7 24 52 AP-type membrane coat adaptor complex 6.1E-09 2.8 49 2.7E-02 1.9 18 34

cytoplasmic stress granule 2.5E-04 2.4 29 1.3E-02 2.5 12 18 filamentous actin 7.7E-04 2.3 26 2.1E-02 2.3 13 21

Table S14F. Gene Ontology classes significantly targeted by LV IS in MLD03 in vivo.

Page 36: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

37

A Patient Clusters CIS Genes order

> 2 MLD01 1,500 1,680 MLD02 1,216 1,225 MLD03 1,036 1,132

B

Table S15. Number of CIS clusters and genes targeted by LV is in MLD patients through the sliding windows approach. (A) For each MLD patient >1000 CIS clusters and genes targeted by >2 integrations were identified. (B) Top 20 CIS with the highest number of integrations identified in MLD patients. From each patient (first column) a number of genomic clusters (second column) containing CIS constituted by at least 7 IS (third column) were identified. Top 20 targeted genes by CIS are indicated in the fourth column. In parenthesis is indicated the number of integrations targeting each gene.

Patient Clusters CIS Genes order ≥7 Top 20 targeted CIS

MLD01 138 221

KDM2A (62), PACS1 (53), TNRC6C (46), FCHSD2 (37), C6orf10 (37), ASH1L (36), NF1 (33), GRB2 (31), RERE (30), TNRC6B (30), UBE2G1 (27), SMARCC1 (26), OPTC (23), EIF4G3 (22), NFAT5 (22), ACOX1 (22), NSD1 (22), HLA-E (22), NOTCH1 (22), CAPN1 (21)

MLD02 76 105

KDM2A (39), PACS1 (38), FCHSD2 (24), NF1 (22), TNRC6C (22), SMG6 (19), RPTOR (19), OPTC (18), CBFB (18), ASH1L (17), NSD1 (17), C6orf10 (17), DIP2B (16), STAT3 (16), TNRC6B (16), SHANK3 (16), FNBP1 (16), FBXL20 (15), EIF4G3 (14), RABEP1 (14)

MLD03 92 137

KDM2A (59), PACS1 (54), TNRC6C (44), RPTOR (35), NF1 (30), EIF4G3 (29), C6orf10 (29), FCHSD2 (26), ASH1L (24), FNBP1 (22), RERE (21), RTN3 (21), STAT5B (21), GRB2 (21), SETD2 (21), SAPS2 (20), RBM6 (20), SEC16A (19), SAPS3 (18), CARD8 (18)

Page 37: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

38

A

Clinical Trial N patients N ISs N targeted

genes γRV X-SCID Paris 5 665 547

γRV X-SCID London 1 566 526

γRV CGD 2 677 493

γRV WAS 2 11,176 4,857

LV ALD 2 2,610 1,805

LV MLD 3 36,898 8,508

B

D'Agostino & Pearson omnibus normality test

CGD X-

SCID Paris

X-SCID UK

ALD MLD Mila

n K2 4.8 5.0 4.0 2.9 2.8 P value 0.09 0.07 0.13 0.23 0.23 Passed normality test (alpha=0.05) Yes Yes Yes Yes Yes

C

D

Clinical Trial N Targeted

Genes N CIS raw

p-value N CIS corrected

p-value

γRV X-SCID Paris 547 8 3

γRV X-SCID London 526 3 0

γRV CGD 493 6 4

γRV WAS 4,857 212 7 LV ALD 1,805 40 9 LV MLD 8,480 362 7

E Clinical Trial CIS Genes (10 top ranking) LV ALD PACS1 (18), TNRC6C (13), HLA-DQA1 (6), KDM2A (13), C6orf10

(10), MUM1 (6), SLITRK1 (5), HLA-DMB (5), SLITRK5 (5), SLC22A11 (5)

LV MLD KDM2A (192), TNRC6C (132), PACS1 (168), C6orf10 (104), OPTC (53), GRB2 (80), CAPN1 (55), MUM1 (49), IP6K1 (63), HLA-E (40)

F Clinical Trial CIS Genes (10 top ranking) γRV X-SCID

Paris ZNF217 (8), CCND2 (9), LMO2 (5), TSRC1 (3), TOMM20 (3), FAM9C (3), PTGER4 (3), AFTIPHILIN (3)

γRV X-SCID London

FLJ20625 (3), TOMM20 (3), CD34 (3)

γRV CGD MECOM (91), PRDM16 (36), MN1 (5), C6orf1 (3), NEUROD4 (3), TAOK3 (2)

γRV WAS LMO2 (33), MECOM (154), ZNF217 (25), TOMM20 (22), KLF6 (22), MIR17HG (20), CCND2 (21), NRIP1 (30), FLI1 (18), MAP3K8 (16)

D'Agostino & Pearson omnibus normality test Patient MLD01

Patient MLD02

Patient MLD03

K2 5.1 5.4 5.0 P value 0.08 0.07 0.08 Passed normality test (alpha=0.05) Yes Yes Yes

Page 38: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

39

Table S16. Summary tables of datasets, validation tests and results for CIS identification by the Grubbs test for outliers. (A) IS datasets analyzed for CIS identification using the Grubbs test for outliers. The number of patients, integrations and targeted genes for each clinical trial are reported. (B) D'Agostino & Pearson omnibus normality test of the -log2 of gene integration frequency in datasets from different clinical trials. (C) D'Agostino & Pearson omnibus normality test of the -log2 of gene integration frequency in datasets from the 3 MLD patients analyzed independently. (D) Summary table of the CIS identified in each clinical trial. The number of genes targeted by at least one vector integration varied depending on clinical trial analyzed (N Targeted Genes). The number of CIS identified before and after p-value correction is indicated (N CIS raw p-value and N CIS corrected p-value respectively). (E) CIS genes identified in the LV-based clinical trials. Top 10 ranking significant genes are shown. In bold are indicated the CIS genes with a p-value <0.05 after correction. (F) Summary table of the CIS genes identified in γRV clinical trials. Top 10 ranking significant genes are shown. In bold are indicated the CIS genes with a p-value <0.05 after correction.

Page 39: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

40

Patient Integration sites

Passing Filters by Collisions

Passing Filters by Seq. Count

MLD01 15,011 14,482 10,536 MLD02 11,614 11,077 8,339 MLD03 11,470 10,959 7,556

Table S17. Summary of IS for each patient after filtering reads by collisions and sequence count.

Page 40: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

41

CD34 Myeloid Lymphoid B Lymphoid T

Gene 1 3 12 18 1 3 6 9 12 18 1 3 6 9 12 18 3 6 9 12 18 TCF3 1 1 0 1 0 1 1 1 1 1 0 1 1 1 0 1 0 1 0 0 1 CASC4 0 0 1 1 0 0 1 1 1 1 0 0 1 1 1 1 0 0 1 1 1 STAT5A 1 1 1 1 0 1 1 1 0 1 0 1 0 0 1 1 0 0 1 0 1 KHDRBS1 0 0 1 1 0 1 1 1 1 1 0 0 1 1 1 1 0 0 0 0 1 RAB11FIP3 0 1 1 1 0 1 1 1 1 1 0 0 0 0 1 1 0 0 1 0 1 BIRC6 0 1 1 1 0 0 1 1 1 1 0 0 1 0 0 0 0 1 1 1 1 TNRC6B 0 1 0 1 0 1 1 1 1 1 0 0 0 0 1 1 0 0 0 1 1 RPP21 0 1 1 1 0 1 1 1 1 1 0 0 1 0 0 1 0 0 0 0 1 TM9SF2 0 1 1 1 0 1 1 1 1 1 0 0 0 1 1 0 0 0 0 0 1 PCGEM1 0 1 1 1 0 1 1 1 0 1 0 1 0 0 0 1 0 0 0 1 1 SUV420H1 0 1 1 1 0 0 1 0 1 1 0 0 0 1 1 1 0 0 0 1 1 SLC2A6 0 0 0 1 0 1 1 1 1 1 0 0 1 1 0 1 0 0 0 0 1 ARID2 0 1 1 1 0 0 1 1 0 1 0 0 0 1 1 1 0 0 0 0 1 MOV10L1 0 0 1 1 0 0 1 1 1 1 0 0 0 0 1 1 0 0 0 0 1 RPL3L 0 1 0 1 0 0 1 1 1 1 0 0 1 0 0 1 0 0 0 0 1 KIAA1267 0 1 1 1 0 1 1 1 0 1 0 0 1 0 0 1 0 0 0 0 0 FAM102A 0 1 1 1 0 1 1 0 0 1 0 0 1 1 1 0 0 0 0 0 0 SUMO2 0 1 0 0 0 1 1 1 1 0 0 0 1 1 1 0 0 0 0 0 0 SCAI 0 0 0 1 1 0 1 0 1 1 0 0 0 1 0 1 0 0 0 0 1 WDTC1 0 0 0 1 0 1 1 1 0 1 0 0 0 0 0 1 0 0 1 0 1 Table S18. Stem cell marking overtime. Given main Figure 6C, we reported top 20 shared ISs among CD34+, lymphoid T, B cells and myeloid lineage.

Page 41: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

42

Patient Schnabel model

(12m - 18m) Chao model

(9m - 12m - 18m) +/- std. err

MLD01 3,713 5,734.3 +/- 297

MLD02 2,155 3,483 +/- 105

Table S19. Population size estimation. To estimate the population size of transduced active HSC, we exploited patient MLD01 and MLD02 data and selected the short- lived myeloid lineage cells from peripheral blood long term after HSC-GT as readout of stem cell output. Thus the IS from long-term PB myeloid cell datasets selected for this analysis were filtered by collisions and by sequence count. We adopted two mark-recapture models: the Schnabel-Petersen estimator, that allows to use only two time points, and the Chao log-linear regression model, that supports comparing multiple time points. We selected latest time point to best fit model requirements.

Page 42: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

43

SUPPORTING FIGURE LEGENDS

Figure S1. Optimization of LV-mediated gene transfer into BM-derived HSPCs. (A-D) Characterization of the progeny of the HSPCs derived from either mobilized peripheral blood (MPB) or BM of healthy donors (HD) or MLD patients and transduced with large scale / GMP LVs for 1 or 2 rounds of transduction, as indicated. The GMP ARSA LV preparations were used for the validations runs (a total of 3 runs of transduction of HD HSPCs was required for validation of the transduction protocol) and for the transduction of the HSCs of patients MLD01-03. The

Fig. S1

CD

34

CD

19

CD

13

CD

19

CD

11b

CD

4+C

D8+

CD

4+/C

D8-

CD

4-C

D8+

0

25

50

75

100 UMTR

% m

arke

r+ c

ells

A B

C D

E F

BM Spl Thy

UT

HD

MP

B

MP

B 1

ro

un

d

MP

B 2

ro

un

ds

UT

HD

BM

HD

BM

1 r

ou

nd

HD

BM

2 r

ou

nd

s

UT

ML

D B

M

ML

D B

M 1

ro

un

d

ML

D B

M 2

ro

un

ds

Val

idat

ion

ru

ns

ML

D01

ML

D02

ML

D03

0

5000

10000

15000 ***

*

AR

SA

act

ivity

(nm

ol/m

g/h)

UT

HD

BM

HD

BM

1 r

ou

nd

HD

BM

2 r

ou

nd

s

UT

ML

D B

M

ML

D B

M 2

ro

un

ds

Val

idat

ion

ru

ns

ML

D01

ML

D02

ML

D03

0

5

10

15

# C

FCs

(x10

^4/1

0^6

cells

)

UM

BM

(4/

7)

TR

BM

(10

/18)

UM

Sp

l (5/

7)

TR

Sp

l (10

/18)

UM

Th

y (3

/7)

TR

Th

y (1

5/18

)0

20

40

60

80

100

%hC

D45

+ ce

lls

LC BM Spl Thy0

1

2

3

4

5

VC

N (

copi

es/h

uman

gen

ome)

G

Page 43: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

44

optimization on HD’s HSPCs and on HSPCs from MLD patients not enrolled in the clinical trial was performed with non-GMP, large-scale purified ARSA LV. n≥7 in HD and n≥3 in MLD patients, except for the clinical samples (MLD01, 02 and 03). The protocol based on 2 transduction rounds was judged as the best performing and thus employed for the validation runs and the transduction of HSPCs from the three patients enrolled in the clinical trial. Vector copy number (VCN)(A), expressed as copies of LV/human genome measured by qPCR on the in vitro liquid culture progeny of the transduced cells (after 14 days of culture); transduction efficiency (B), evaluated by qPCR on individual colonies from CFC assay performed on the transduced HSCs and expressed as percentage (%) of LV+ colonies on total tested colonies; ARSA activity (C) measured by the PNC assay on the in vitro liquid culture progeny of the transduced cells; in vitro clonogenic potential (D), assessed by the CFC assay performed on the transduced cells – the total number of colonies was counted. Max-min are shown in the graphs; *=p<0.01 and ***=p<0.0001 at One Way Anova with Bonferroni post-hoc test. (E-F) The BM-derived HSPCs transduced with two rounds of transduction and large-scale ARSA LV were injected into sub-lethally irradiated Rag2-/-Il2rγ-/- mice to test their engraftment and differentiation potential in comparison to un-manipulated BM-derived HSPCs (UM). (E) Human cell engraftment (% of cells positive for the human CD45 antigen at cytofluorimetry) and their differentiation (F) in the BM, spleen (Spl) and thymus (Thy) of the transplanted mice 8 weeks after transplant. (G) VCN, expressed as copies of LV/human genome measured by qPCR on the in vitro liquid culture progeny of the transduced cells (after 14 days of culture, LC) and on the tissues retrieved from the transplanted mice, as above.

Page 44: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

45

Figure S2. Flow chart of transduced CD34+ cell manufacturing.

Fig. S2

Page 45: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

46

Figure S3. Flow chart for ARSA LV supernatant production. MCB: master 293T cell bank; CF10: 10-tray cell factory.

Fig. S3

Page 46: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

47

Figure S4. Scheme of patients’ treatment; BM: bone marrow; MNCs: mononuclear cells; iv: intravenous.

CD34+ cell transduction

BM back up minimum target MNCs≥ 1x10^8/kg

BM harvest minimum target CD34+ 5x10^6/kg

CD34+ cell positive selection

iv Busulfan (dose-adjusted) (14 doses)

Fresh cell infusion CD34+ ≥ 2x10^6/kg

-35 0 day -1 -2 -3 -4

Fig. S4

Page 47: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

48

Fig. S5 (Page 1)

A

B

MLD01 MLD02 MLD03

0 10 20 30 40 50 60 70

2.5

5.0

7.5

10.0

12.5

WBC NeutrophilsLymphocytes Monocytes

follow up (days)

Cel

l cou

nts

(x10

^9)

2.5

5.0

7.5

10.0

12.5

-5 0 5 10 15 20 25 30 35 40 45 50 55 60 65 700.0

2.5

5.0

7.5

10.0

12.5

15.0

0

100000

200000

300000

400000

follow up (days)

Hb

(mg/

dl)

Ptl counts

Transfusional support

Page 48: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

49

0

25

50

75

100

CD

2 C

D3

CD

19

CD

20

CD

22

CD

3/C

D8

CD

3/C

D4

CD

20/C

D22

C

D16

/CD

56

Kap

pa

Lam

bda

IgM

G

lyc

A*

CD

15*

CD

13*

CD

34*

CD

33+/

CD

61-*

C

D61

* C

D33

*

baseline 6mths 12mths

C

0

25

50

75

100

CD

3 C

D3/

CD

4 C

D3/

CD

8 C

D4

CD

8 C

D19

C

D2

CD

15*

CD

13*

CD

14*

CD

14+/

CD

16-*

C

D14

+/C

D16

+*

CD

16*

CD

56

CD

16/C

D56

TC

R1

TCR

2 K

appa

La

mbd

a

baseline 6mths 12mths

MLD01

MLD02

MLD03

%m

arke

r+ c

ells

in ly

mph

omon

ocyt

es (*

in C

D45

+ ce

lls)

BM cell subsets PB cell subsets

0

25

50

75

100

CD

3 C

D3/

CD

4 C

D3/

CD

8 C

D4

CD

8 C

D19

C

D2

CD

15*

CD

13*

CD

14*

CD

14+/

CD

16-*

C

D14

+/C

D16

+*

CD

16*

CD

56

CD

16/C

D56

TC

R1

TCR

2 K

appa

La

mbd

a

baseline 6mths 12mths

0

25

50

75

100

CD

2 C

D3

CD

19

CD

20

CD

22

CD

3/C

D8

CD

3/C

D4

CD

20/C

D22

C

D16

/CD

56

Kap

pa

Lam

bda

IgM

G

lyc

A*

CD

15*

CD

13*

CD

34*

CD

33+/

CD

61-*

C

D61

* C

D33

*

baseline 6mths 12mths

0

25

50

75

100

CD

3 C

D3/

CD

4 C

D3/

CD

8 C

D4

CD

8 C

D19

C

D2

CD

15*

CD

13*

CD

14*

CD

14+/

CD

16-*

C

D14

+/C

D16

+*

CD

16*

CD

56

CD

16/C

D56

TC

R1

TCR

2 K

appa

La

mbd

a

baseline 6mths 12mths 18mths

0

25

50

75

100

CD

2 C

D3

CD

19

CD

20

CD

22

CD

3/C

D8

CD

3/C

D4

CD

20/C

D22

C

D16

/CD

56

Kap

pa

Lam

bda

IgM

G

lyc

A*

CD

15*

CD

13*

CD

34*

CD

33+/

CD

61-*

C

D61

* C

D33

*

baseline 12mths 18mths

Fig. S5 (Page 2)

n.a.

n.

a.

Page 49: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

50

Figure S5. Hemato-immunological characterization of patients MLD01, 02 and 03. (A and B) Counts of white blood cells (as a whole or individual sub-populations)(A) and of hemoglobin (Hb) and platelets (Ptl)(B) are shown. Transfusional support (red blood cells and platelets) is indicated. (C) Immunophenotypical characterization of BM(left panels) and PB (right panels) of the patients at the indicated time points. (D) The proportion of specific TCR Vbeta families within CD3+ T cells is reported for each patient at the indicated time points, in comparison to a group of HDs (n=85).

MLD01

MLD02

0

2

4

6

8

10

12

1 2 3 4 5.

1 5.

2 5.

3 7.

1 7.

2 8 9 11

12

13.1

13

.2

13.6

14

16

17

18

20

21

.3

22

23

baseline 6mths 12mths

MLD03

%TC

R V

beta

in C

D3+

lym

phoc

yte

subs

et

TCR Vbeta repertoire

D

0

2

4

6

8

10

12

1 2 3 4 5.

1 5.

2 5.

3 7.

1 7.

2 8 9 11

12

13.1

13

.2

13.6

14

16

17

18

20

21

.3

22

23

baseline 6mths 12mths 18mths HD

0

2

4

6

8

10

12

1 2 3 4 5.

1 5.

2 5.

3 7.

1 7.

2 8 9 11

12

13.1

13

.2

13.6

14

16

17

18

20

21

.3

22

23

baseline 6mths 12mths

Fig. S5 (Page 3)

Page 50: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

51

Figure S6. Gene marking in patients MLD01, 02 and 03 after HSC gene therapy. (A) Vector copy number (VCN) expressed as copies of LV/human genome measured by qPCR on BM-derived mononuclear cells (MNCs) in the three patients. (B-D) VCN on MNCs and individual sub-populations isolated from BM (see legend) of MLD01 (B), MLD02 (C) and MLD03 (D). (E-F) VCN on PBMCs and individual sub-populations isolated from PB (see legend) of MLD02 (E) and MLD03 (F).

Fig. S6

A B

C D

E F

0 5 10 15 20 250.0

0.5

1.0

1.5

2.0

2.5

3.0MNCsCD34+CD13+CD15+CD19+GlyA+CD61+CD56+CD3+

months after GT

VCN

(co

pies

/gen

ome)

0 5 10 15 20 250.0

0.5

1.0

1.5

2.0

MLD01 CD34MLD02 CD34MLD03 CD34

months after GT

VCN

(co

pies

/gen

ome)

0 5 10 15 200.0

0.5

1.0

1.5

2.0

2.5

3.0 MNCsCD34+CD13+CD15+

CD19+GlyA+

CD61+

CD56+CD3+

months after GT

VCN

(cop

ies/

geno

me)

0 5 10 15 200.0

0.5

1.0

1.5

2.0 MNCsCD34+CD13+CD15+CD19+GlyA+CD61+CD56+CD3+

months after GT

VCN

(cop

ies/

geno

me)

0 5 10 15 200.0

0.5

1.0

1.5

2.0

2.5

3.0PBMCsCD14CD15CD19CD3CD56

months after GT

VCN

(cop

ies/

geno

me)

0 5 10 15 200.0

0.5

1.0

1.5

2.0

PBMCsCD14CD15CD19CD3CD56

months after GT

VCN

(cop

ies/

geno

me)

Page 51: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

52

Figure S7. (A and B) ARSA activity measured by the PNCS assay on total PBMCs (A) and CD3+ cells (B) isolated from the PB of the patients. The activity measured in a cohort of healthy donors (HD, n=190 for PBMCs and n=10 for CD3+ cells) is shown. (C) DEAE cellulose-chromatography analysis on PBMCs isolated from a pool of HDs and patient MLD01 before (MLD01 baseline) and 1 month after gene therapy (MLD01 +1mth). The peak of activity corresponds to the native form of the ARSA enzyme, as also demonstrated by the absence/very little residue in the patient’s pre-treatment sample. (D) ARSA activity of chromatography-isolated enzyme from the indicated peripheral blood populations from HDs (n=4) and the 3 treated patients at the indicated time points, measured towards the artificial substrate MUS.

HD

base

line

6mth

s

12m

ths

0

2

4

6

8

AR

SA s

p. a

ctiv

ity (n

mol

/mg/

h) PBMCsCD3CD19CD15CD14

HDs’ range

Fig. S7

A

C

B

0 5 10 15 20 258

163264

128256512

10242048

MLD01 PBMCsMLD02 PBMCsMLD03 PBMCs

months after GT

AR

SA a

ctiv

ity (n

mol

/mg/

h)

642856 HDs’ range

D

mU

tot

PBMCs MLD01 Baseline

mU

tot

PBMCs MLD01 FU+28

Fraction (0.5 ml)

mU

tot

HDs

MLD01 pre-GT

MLD01 +1mth

HDs’ range

0 5 10 15 20 251248

163264

128256512

10242048 MLD01 CD3

MLD02 CD3MLD03 CD3

months after GT

AR

SA a

ctiv

ity (n

mol

/mg/

h)

Page 52: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

53

Figure S8. ARSA and W-PRE detection by immunofluorescence. (A) Staining optimization. HeLa cells were transduced with a LV encoding a HA-tagged ARSA and used to optimize ARSA staining by immunofluorescence. Transduced cells and un-transduced controls were stained with anti-HA, anti-ARSA and anti-Lamp1 antibodies. A good co-localization of the ARSA and HA staining within Lamp1+ lysosomes is shown. Magnification 63X in the two upper rows and 320X in the lower row. (B) Immunofluorescence showing co-expression of ARSA protein and transgene mRNA (identified by in situ hybridization with a probe for the Woodchuck Hepatitis

Fig. S8 A

B

Page 53: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

54

Virus Post-transcriptional Regulatory Element, W-Pre, present in the LV) within hematopoietic cells isolated from the PB of Patient MLD01 before and 1 year after HSC-GT (see figure for details) and from a HD. Most of the CD14+ cells and only a fraction of the CD3+ cells express the exogenous ARSA one year after treatment, consistently with the gene marking levels detected in the two sub-populations. The expression level of ARSA in Patient’s cells after treatment appears higher than that observed in the HD. Magnification 63X.

Page 54: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

55

Figure S9. MR imaging of patients MLD02 (A) and MLD03 (B) before HSC-GT and at the last follow up. (A) A normal pattern of MR signal for the age is present in the baseline study (Axial TSE T2 weighted images and axial FLAIR images; before gene therapy at 12 months of age); the slight hyperintensity of posterior periventricular white matter is normal related to the incomplete myelination process. MR images at the same levels performed 18 months after gene therapy show small areas of hyperintensity signal in the posterior and anterior periventricular white matter, while in the other brain regions the white matter signal is normal for the age. The size of the ventricular system and the subaracnoid spaces is normal. The signal and the size of basal ganglia and thalami are normal. (B) Axial TSE T2 and axial FLAIR images performed at the age of 7 months, before HSC-GT, show a normal pattern of white matter signal. The slight enlarged size of the ventricular system and of the subaracnoid spaces are normal related to a mild familiar benign macrocrania. At the MRI performed 18 months after HSC-GT, at the age of 25 months, the size of the ventricular system and subaracnoid spaces is reduced and only blurry hyperintensity

Fig. S9

Before GT (12mo) A

B Before GT (7mo)

+ 1,5yr post-GT (30mo)

FLAIR

T2

+ 1,5yr post-GT (25mo)

FLAIR

T2

Page 55: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

56

signal is present in posterior paraventricular white matter, possibly related to incomplete myelination process at this age.

Page 56: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

57

Figure S10. Activity flowchart for integration site (IS) analysis. Four main macro-activities from “wet” sample processing to bioinformatics are shown. Each macro-activity comprises several dependent activities, represented by rounded boxes, connected by data flow, represented by arrows. (1) Wet Lab Procedures: LAM-PCRs and Next-Generation Sequencing (NGS). (2) NGS data processing: acquiring as input sequencing reads from NGS platforms and getting as output the list of integration sites. (3) Data quality processing: performing three sequential activities to improve data quality. Each activity generates the input for distinct processes in step 4. (4) IS-driven biological analysis: performing inferences on safety and efficacy of gene therapy as well as on human hematopoiesis.

Page 57: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

58

Figure S11. Density plot of sequences’ lengths. Raw reads counts and the sequence length without LTR and LC segments of all sequences (Illumina paired-ends have been merged at overlapping regions, otherwise we plotted only reads derived from LTR containing sequences).

Page 58: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

59

Figure S12. Figure representative samples Spreadex gel electrophoresis of Linear Mediated Amplification PCR (LAM PCR) reactions for 3 MLD patients. In order to identify vector integration sites 3’ vector LTR-genome junctions were amplified by LAM-PCR, according to the method published by Schmidt et al. (Nature Methods, 2007). The starting linear amplification (100 cycles) was done using 5’ biotinylated LTR specific primers using 10-100 ng of genomic DNA (gDNA) as template. Linear amplification products were purified using streptavidin magnetic beads and followed by complementary strand synthesis, parallel digestion with 3 different restriction enzymes (Tsp509 I, HpyCH4 IV and Aci I), and ligation to a linker cassette (LC). The fragments generated were then amplified by two additional exponential PCR steps. LAM-PCR products were separated by gel electrophoresis on Spreadex high-resolution gels (Elchrom Scientific). Representative LAM PCR reactions obtained by Tsp509I restriction enzyme are shown. = The vector backbone derived Tsp509I

Page 59: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

60

internal control band (IC) is indicated. The surface marker used for purification of the myeloid (CD15+) and lymphoid T (CD3) samples are shown within parenthesis.

Page 60: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

61

Figure S13. Collision filtering. (A) Example of collision relative frequency (ColRF) plot in 454 sequencing data from patient MLD02. ColRF values are in log10 transformation (x-axis). Positive peak at +2 means that all collisions of this area carry sequence counts 20 times higher in the analyzed patient with respect to the others. The values under peak at -2 have 20 times lower sequence counts in the same patient as compared to the others. The peak at 0 indicates that all these collisions have identical sequence counts among patients. We generated theoretical case scenarios for two patients, A and B (see Table S13B), and applied these scenarios to our empirical ColRF curve thus allowing to interpret the data as decision fields (represented by the different colors) for elimination of collisions. The chosen threshold to set for contamination identification patient-based is 1, corresponding to 10 fold difference in linear scale. (B) Venn diagram shows the number of detected collisions in MLD and WAS clinical trials. We removed 607 IS from MLD study: 503 collisions were removed from MLD study only, 104 were removed both from MLD and WAS study, while 362 ISs were removed from WAS study only.

Page 61: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

62

Figure S14. Integration site frequency distribution around Transcription Start Sites (TSS) and transcription unit in MLD patients. The analyses were performed on the in vitro transduced CD34+ cells (cultured for two weeks) and form cells in vivo. IS datasets were analyzed separately for each patient and condition. In the upper panels the frequency distribution of IS, around the TSS is shown for 5Kbp bins (spanning 50Kb upstream and 100 kb downstream the TSS); The middle panels show a more detailed view around the TSS by using bins of 500bp spanning 5Kb upstream and 10Kb downstream the TSS;. In the lower panels is shown the relative distribution of IS along the transcriptional units. The length of each transcriptional unit was normalized to 100% and divided into 20 x 5% bins. Then, the percentage of ISs landing inside genes (on average 80% of total IS) was calculated for each bin.

Page 62: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

63

Figure S15. Gene Ontology analysis (GO) on ISs from 3 MLD and 2 ALD patients through GREAT (http://great.stanford.edu/). GO classes were considered significant when both tests provided a FDR<0.05. To compare the level of similarity of the gene classes preferentially targeted by LV integrations in ALD and MLD clinical trials we measured the gene sharing between the significantly overrepresented gene classes of each clinical trial. The level of gene sharing in the overrepresented gene classes for GO molecular functions is highlighted with different background colors. Overall, both clinical trials displayed a remarkable similarity as shown by the high percentage of shared genes among the vast majority of gene classes.

Page 63: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

64

Figure S16. Box plot of the percentage of sequence reads (y-axis) for unequivocally mapped IS from patient MLD03 in different cell types (CD34+, myeloid or lymphoid cells) from different sources (BM or PB) and time points (months after gene therapy, indicated below the x-axis). The number of reads for each IS was normalized to the total number of sequence reads from the same time point and source. IS over the 95 percentile of dataset are shown as dots distinct from box and whiskers, which are mostly flattened to the bottom of the plot. The total number of IS for each lineage and time point is shown on top. Most represented integrations (with the hit gene indicated

Page 64: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

65

next to the dot) are enriched in oligoclonal populations such as in PB-derived B or T cells at early time points.

Page 65: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

66

Figure S17. CIS identified by the sliding windows approach. (A) Genomic distribution of integration within CIS clusters and genes under Abel’s method implementation. In the y-axis is represented the number of integrations or the number of the CIS order. Green areas delimit groups of genes by CIS clusters. The red bars represent the CIS of max order value for 100 genes contained within the top ranking CISs. The blue bars represent the number of integration sites within each gene sorted by genomic position within each CIS cluster. (B) Venn diagram showing the genes contained in CIS order >2 and shared among patients. From 61 to 75% of the CIS genes of one patient were shared with one of the other 2 patients. (C) Venn diagram showing shared genes contained in CIS order ≥10 and targeted by ≥7 integrations.

Fig. S17 A

B

C

A

30

-20

-10

0

10

20

30

40

50

60

70

80

90

KD

M2A

A

DR

BK

1 A

NK

RD

13D

S

SH

3 LO

C10

0130

987

PAC

S1

FRM

D8

NE

AT1

MIR

612

MA

LAT1

S

CY

L1

EH

BP

1L1

MA

P3K

11

PC

NX

L3

TNR

C6C

P

PP

1R2P

1 H

LA-D

MB

H

LA-D

MA

BR

D2

HLA

-DO

A H

LA-D

PA1

HLA

-DP

B1

STA

T5B

S

TAT3

H

LA-E

G

NL1

P

RR

3 A

BC

F1

RE

LA

UN

K

AC

OX

1 C

17or

f106

S

NA

PC

4 P

MP

CA

INP

P5E

S

EC

16A

NO

TCH

1 C

6orf1

0 IF

T140

TM

EM

204

CR

AM

P1L

H

N1L

M

AP

K8I

P3

GR

B2

KIA

A01

95

NFK

BIL

1 LT

A N

CR

3 B

AT2

BAT

5 M

SH

5 C

DC

A5

C11

orf2

S

PD

YC

C

AP

N1

PO

LA2

C19

orf2

2 A

RID

3A

CN

N2

AB

CA

7 P

OLR

2E

VAR

S

LSM

2 C

LCN

7 C

16or

f38

SR

P68

A

LDH

16A

1 FC

GR

T N

OS

IP

PR

R12

E

GFL

7 N

F1

BTB

D2

MK

NK

2 M

OB

KL2

A A

P3D

1 D

OT1

L A

GPA

T2

MU

M1

DA

ZAP

1 C

PT1

C

AS

H1L

C

6orf4

8 E

HM

T2

FAM

69B

K

AT5

RN

AS

EH

2C

AN

KFY

1 U

BE

2G1

SB

NO

2 S

TK11

C

SN

K1G

2 H

LA-D

OB

C

9orf8

6 H

AG

H

FAH

D1

C16

orf7

3 S

PN

S3

RP

P21

M

ICB

MLD01

-30

-20

-10

0

10

20

30

40

50

60

70

80

90

KD

M2A

A

DR

BK

1 A

NK

RD

13D

LO

C10

0130

987

PAC

S1

PP

P1R

2P1

HLA

-DM

B

HLA

-DM

A B

RD

2 H

LA-D

OA

HLA

-DPA

1 TN

RC

6C

AFM

ID

TBC

1D10

C

SN

AP

C4

PM

PC

A IN

PP

5E

SE

C16

A N

OTC

H1

STA

T5B

S

TAT3

P

TRF

C6o

rf10

BTN

L2

ATP

6V0A

1 E

GFL

7 TA

P2

AG

PAT2

FA

M69

B

MIR

612

MA

LAT1

LT

BP

3 E

HB

P1L

1 M

AP

3K11

S

PD

YC

C

AP

N1

SLC

22A

20

PO

LA2

DP

F2

IP6K

1 U

BA

7 TR

AIP

C

9orf8

6 R

PTO

R

BAT

5 M

SH

5 VA

RS

LS

M2

HS

PA1L

K

AT5

BAT

3 R

NA

SE

H2C

N

F1

RB

M6

RB

M5

IFT1

40

CR

AM

P1L

H

N1L

M

AP

K8I

P3

C16

orf7

3 H

LA-D

QA

2 A

RID

3A

CN

N2

HM

HA

1 S

BN

O2

GR

B2

KIA

A01

95

UN

K

WB

P2

AC

OX

1 C

17or

f106

S

RP

68

EX

OC

7 R

NF1

57

STK

11

TELO

2 LO

C28

3999

K

RI1

S

LC44

A2

ILF3

D

NM

2 R

TN3

MA

RK

2 FA

M11

6B

SA

PS

2 FN

BP

1 P

OLR

2A

TNFS

F12

SE

NP

3 FX

R2

TUB

GC

P6

ATL3

FA

M13

4C

SU

MO

2 B

AH

CC

1 A

CTG

1 C

17or

f70

NP

LOC

4 A

SH

1L

SE

TD2

MLD03 ISs Count Max CIS Order Cluster ID

30

-20

-10

0

10

20

30

40

50

60

70

80

90 K

DM

2A

AD

RB

K1

LOC

1001

3098

7 PA

CS

1 P

PP

1R2P

1 H

LA-D

MB

H

LA-D

MA

HLA

-DO

A H

LA-D

PA1

STA

T5B

S

TAT3

D

PF2

FR

MD

8 N

EAT

1 M

IR61

2 M

ALA

T1

SC

YL1

P

CN

XL3

P

SM

B9

PO

LA2

SE

C16

A N

OTC

H1

EG

FL7

ATP

6V0A

1 A

RID

3A

CN

N2

AB

CA

7 H

MH

A1

SB

NO

2 S

NA

PC

4 P

MP

CA

AG

PAT2

FA

M69

B

SN

HG

7 C

11or

f2

CA

PN

1 TN

RC

6C

STK

11

C9o

rf86

MU

M1

BTB

D2

MO

BK

L2A

C19

orf3

6 A

P3D

1 D

OT1

L S

F3A

2 C

PT1

B

SH

AN

K3

IFT1

40

TME

M20

4 C

RA

MP

1L

HN

1L

MA

PK

8IP

3 N

F1

MS

H5

VAR

S

LSM

2 C

2 H

LA-D

QA

2 D

CI

RN

PS

1 A

BC

A3

AB

CA

17P

CC

NF

DLG

4 D

VL2

C

17or

f81

DA

ZAP

1 C

SN

K1G

2 IP

6K1

UB

A7

TRA

IP

C6o

rf10

RE

LA

HA

GH

FA

HD

1 C

16or

f73

SLC

44A

2 IL

F3

DN

M2

RB

M6

KAT

5 N

PLO

C4

CC

DC

137

HG

S

SLC

25A

10

FBX

L20

AR

IH2

QR

ICH

1 TR

AF2

FC

HS

D2

CLC

N7

CR

KR

S

UN

K

AC

OX

1 C

17or

f106

P

4HB

A

RH

GD

IA

THO

C4

AS

H1L

MLD02 ISsCount maxCISOrder Cluster ID

Page 66: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

67

Figure S18. Frequency distribution of -log 2 transformed gene integration frequency (GIF) used for the Grubbs test for outliers. (A) Frequency distribution of transformed GIF (-log2 of GIF, y-axis) within genes of mapped IS datasets from different clinical trials (GIFs from pooled patients). (B) Frequency distribution of transformed GIF in datasets from the 3 MLD patients analyzed separatedly.

Fig. S18 A

-log2

GIF

frequ

ency

CGD

X-SC

ID P

aris

X-SC

ID U

K

ALD

MLD

Mila

n

0.00

0.05

0.10

0.15

0.20

B

Page 67: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

68

Page 68: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

69

Page 69: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

70

Fig. S19 (Page 3)

B

E

0 4 8

12 16 20

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 19 20 21 22 X % o

f tot

al C

IS

Chromosome

CIS distribution in Chromosomes

FANCL OR4E2 TOMM20

MECOM KLF6 LMO2 CCND2 MIR17HG ZNF217

KDM2A TNRC6C PACS1 C6orf10 HLA-DQA1

HLA-DMB SLITRK1 SLITRK5 OPTC

C D

Page 70: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

71

Figure S19. CIS identification by the genome-wide Grubbs Test for outliers (A) Chromosomal distribution of Z-ratios (y-axis) of gene integration frequencies in MLD patients. In red are indicated the genes that were significantly overtargeted with respect the average gene integration frequency in all genome by the Grubbs test for outliers without p-value correction (raw p-value). (B) Correlation between CIS number and targeted genes. Before p-value correction the number of CIS is directly proportional to the number of targeted genes (blu line, R2 =1). After p-value correction the number of CIS does not depend by the number fo targeted genes (red line, R2 =0.19). (C) Venn diagram showing the sharing of CIS identified by the Grubbs test for outliers in HSC gene therapy trials (the LV-based MLD, ALD and gamma retroviral-based trial WAS) before p-value correction. (D) Venn diagram showing the sharing of CIS identified by the Grubbs test for outliers in HSC gene therapy trials after p-value correction (see supplemental materials and methods). (E) The chromosomal distribution of LV CIS in MLD patients show a strong skewing

Fig. S19 (Page 4) F

G

H

Page 71: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

72

with a strong preference for cromosomes 17 and 19. (F) Region-based Grubbs test for outliers performed on genomic regions surrounding the significant CIS after p-value correction of the MLD clinical trial. These CIS were not considered significantly overtargeted with respect the neighboring genes by the region-based Grubbs test, except for OPTC. IS targeting OPTC were not detected at the last follow-up time in all three patients. (G) Venn diagrams showing that CIS genes identified by the genome wide Grubbs test for outliers are fully contained in the CIS dataset identified by the Abel’s method. These data indicates that the Grubbs test for outliers does identify only CIS that are considered significant by the Abel’s method. (H) Venn diagrams showing that the most significant γRV CIS identified by the Grubbs test for outliers are shared in different γRV clinical trials. Of note all CIS implicated in leukemia were identified by this test.

Page 72: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

73

Figure S20. Integration sites filtering based on sequence count. (A) Box-whiskers plot of IS distribution for CD34+ cells of each patient during time after collision filtering. (B) Box-whiskers plot of IS distribution for CD34+ cells of each patient during time after collision and sequence count (≥3) filtering. (C) ISs Gaussian kernel density distribution of sequence counts for CD34+ cells of patient MLD01 during time (up to 12 months) after and before sequence count filtering.

Page 73: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

74

Figure S21. HSC gene marking, representative results from MLD01. (A) Venn diagram of stem cell marking for patient MLD01 across lymphoid and myeloid lineages and CD34+ cells after collision filtering. (B) Venn diagram of stem cell marking for patient MLD01 across lymphoid, myeloid cells and CD34+ cells after collision and sequence count filtering.

Page 74: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

75

Figure S22. Diversity analysis. Shannon diversity index values (y-axis) during time (x-axis) for different lineages in MLD patients: (A) MLD01, (B) MLD02 and (C) MLD03.

Page 75: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

References 1. J. Aicardi, Diseases of the Nervous System in Childhood (Mac Keith Press, London,

ed. 2, 1998).

2. A. Biffi, G. Lucchini, A. Rovelli, M. Sessa, Metachromatic leukodystrophy: An overview of current and prospective treatments. Bone Marrow Transplant. 42, (Suppl 2), S2–S6 (2008). doi:10.1038/bmt.2008.275 Medline

3. A. Biffi, P. Aubourg, N. Cartier, Gene therapy for leukodystrophies. Hum. Mol. Genet. 20, (R1), R42–R53 (2011). doi:10.1093/hmg/ddr142 Medline

4. W. Krivit, J. H. Sung, E. G. Shapiro, L. A. Lockman, Microglia: The effector cell for reconstitution of the central nervous system following bone marrow transplantation for lysosomal and peroxisomal storage diseases. Cell Transplant. 4, 385–392 (1995). doi:10.1016/0963-6897(95)00021-O Medline

5. A. M. Rovelli, C. G. Steward, Hematopoietic cell transplantation activity in Europe for inherited metabolic diseases: Open issues and future directions. Bone Marrow Transplant. 35, S23 (2005). doi:10.1038/sj.bmt.1704839

6. J. J. Boelens, V. K. Prasad, J. Tolar, R. F. Wynn, C. Peters, Current international perspectives on hematopoietic stem cell transplantation for inherited metabolic disorders. Pediatr. Clin. North Am. 57, 123–145 (2010). doi:10.1016/j.pcl.2009.11.004 Medline

7. J. J. Boelens, Trends in haematopoietic cell transplantation for inborn errors of metabolism. J. Inherit. Metab. Dis. 29, 413–420 (2006). doi:10.1007/s10545-005-0258-8 Medline

8. A. Biffi, M. De Palma, A. Quattrini, U. Del Carro, S. Amadio, I. Visigalli, M. Sessa, S. Fasano, R. Brambilla, S. Marchesini, C. Bordignon, L. Naldini, Correction of metachromatic leukodystrophy in the mouse model by transplantation of genetically modified hematopoietic stem cells. J. Clin. Invest. 113, 1118–1129 (2004). Medline

9. A. Biffi, A. Capotondo, S. Fasano, U. del Carro, S. Marchesini, H. Azuma, M. C. Malaguti, S. Amadio, R. Brambilla, M. Grompe, C. Bordignon, A. Quattrini, L. Naldini, Gene therapy of metachromatic leukodystrophy reverses neurological damage and deficits in mice. J. Clin. Invest. 116, 3070–3082 (2006). doi:10.1172/JCI28873 Medline

10. I. Visigalli, S. Delai, L. S. Politi, C. Di Domenico, F. Cerri, E. Mrak, R. D’Isa, D. Ungaro, M. Stok, F. Sanvito, E. Mariani, L. Staszewsky, C. Godi, I. Russo, F. Cecere, U. Del Carro, A. Rubinacci, R. Brambilla, A. Quattrini, P. Di Natale, K. Ponder, L. Naldini, A. Biffi, Gene therapy augments the efficacy of hematopoietic cell transplantation and fully corrects mucopolysaccharidosis type I phenotype in the mouse model. Blood 116, 5130–5139 (2010). doi:10.1182/blood-2010-04-278234 Medline

11. B. Gentner, I. Visigalli, H. Hiramatsu, E. Lechman, S. Ungari, A. Giustacchini, G. Schira, M. Amendola, A. Quattrini, S. Martino, A. Orlacchio, J. E. Dick, A. Biffi,

Page 76: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

L. Naldini, Identification of hematopoietic stem cell-specific miRNAs enables gene therapy of globoid cell leukodystrophy. Sci. Transl. Med. 2, 58ra84 (2010). doi:10.1126/scitranslmed.3001522 Medline

12. A. Aiuti, F. Cattaneo, S. Galimberti, U. Benninghoff, B. Cassani, L. Callegaro, S. Scaramuzza, G. Andolfi, M. Mirolo, I. Brigida, A. Tabucchi, F. Carlucci, M. Eibl, M. Aker, S. Slavin, H. Al-Mousa, A. Al Ghonaium, A. Ferster, A. Duppenthaler, L. Notarangelo, U. Wintergerst, R. H. Buckley, M. Bregni, S. Marktel, M. G. Valsecchi, P. Rossi, F. Ciceri, R. Miniero, C. Bordignon, M. G. Roncarolo, Gene therapy for immunodeficiency due to adenosine deaminase deficiency. N. Engl. J. Med. 360, 447–458 (2009). doi:10.1056/NEJMoa0805817 Medline

13. M. Cavazzana-Calvo, A. Fischer, Gene therapy for severe combined immunodeficiency: Are we there yet? J. Clin. Invest. 117, 1456–1465 (2007). doi:10.1172/JCI30953 Medline

14. M. G. Ott, M. Schmidt, K. Schwarzwaelder, S. Stein, U. Siler, U. Koehl, H. Glimm, K. Kühlcke, A. Schilz, H. Kunkel, S. Naundorf, A. Brinkmann, A. Deichmann, M. Fischer, C. Ball, I. Pilz, C. Dunbar, Y. Du, N. A. Jenkins, N. G. Copeland, U. Lüthi, M. Hassan, A. J. Thrasher, D. Hoelzer, C. von Kalle, R. Seger, M. Grez, Correction of X-linked chronic granulomatous disease by gene therapy, augmented by insertional activation of MDS1-EVI1, PRDM16 or SETBP1. Nat. Med. 12, 401–409 (2006). doi:10.1038/nm1393 Medline

15. L. Naldini, Ex vivo gene transfer and correction for cell-based therapies. Nat. Rev. Genet. 12, 301–315 (2011). doi:10.1038/nrg2985 Medline

16. L. Naldini, U. Blömer, P. Gallay, D. Ory, R. Mulligan, F. H. Gage, I. M. Verma, D. Trono, In vivo gene delivery and stable transduction of nondividing cells by a lentiviral vector. Science 272, 263–267 (1996). doi:10.1126/science.272.5259.263 Medline

17. N. Cartier, S. Hacein-Bey-Abina, C. C. Bartholomae, G. Veres, M. Schmidt, I. Kutschera, M. Vidaud, U. Abel, L. Dal-Cortivo, L. Caccavelli, N. Mahlaoui, V. Kiermer, D. Mittelstaedt, C. Bellesme, N. Lahlou, F. Lefrère, S. Blanche, M. Audit, E. Payen, P. Leboulch, B. l’Homme, P. Bougnères, C. Von Kalle, A. Fischer, M. Cavazzana-Calvo, P. Aubourg, Hematopoietic stem cell gene therapy with a lentiviral vector in X-linked adrenoleukodystrophy. Science 326, 818–823 (2009). doi:10.1126/science.1171242 Medline

18. S. Hacein-Bey-Abina, A. Garrigue, G. P. Wang, J. Soulier, A. Lim, E. Morillon, E. Clappier, L. Caccavelli, E. Delabesse, K. Beldjord, V. Asnafi, E. MacIntyre, L. Dal Cortivo, I. Radford, N. Brousse, F. Sigaux, D. Moshous, J. Hauer, A. Borkhardt, B. H. Belohradsky, U. Wintergerst, M. C. Velez, L. Leiva, R. Sorensen, N. Wulffraat, S. Blanche, F. D. Bushman, A. Fischer, M. Cavazzana-Calvo, Insertional oncogenesis in 4 patients after retrovirus-mediated gene therapy of SCID-X1. J. Clin. Invest. 118, 3132–3142 (2008). doi:10.1172/JCI35700 Medline

19. E. Montini, D. Cesana, M. Schmidt, F. Sanvito, M. Ponzoni, C. Bartholomae, L. S. Sergi, F. Benedicenti, A. Ambrosi, C. Di Serio, C. Doglioni, C. von Kalle, L.

Page 77: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

Naldini, Hematopoietic stem cell gene transfer in a tumor-prone mouse model uncovers low genotoxicity of lentiviral vector integration. Nat. Biotechnol. 24, 687–696 (2006). doi:10.1038/nbt1216 Medline

20. E. Montini, D. Cesana, M. Schmidt, F. Sanvito, C. C. Bartholomae, M. Ranzani, F. Benedicenti, L. S. Sergi, A. Ambrosi, M. Ponzoni, C. Doglioni, C. Di Serio, C. von Kalle, L. Naldini, The genotoxic potential of retroviral vectors is strongly modulated by vector design and integration site selection in a mouse model of HSC gene therapy. J. Clin. Invest. 119, 964–975 (2009). doi:10.1172/JCI37630 Medline

21. A. Capotondo, M. Cesani, S. Pepe, S. Fasano, S. Gregori, L. Tononi, M. A. Venneri, R. Brambilla, A. Quattrini, A. Ballabio, M. P. Cosma, L. Naldini, A. Biffi, Safety of arylsulfatase A overexpression for gene therapy of metachromatic leukodystrophy. Hum. Gene Ther. 18, 821–836 (2007). doi:10.1089/hum.2007.048 Medline

22. S. Scaramuzza, L. Biasco, A. Ripamonti, M. C. Castiello, M. Loperfido, E. Draghici, R. J. Hernandez, F. Benedicenti, M. Radrizzani, M. Salomoni, M. Ranzani, C. C. Bartholomae, E. Vicenzi, A. Finocchi, R. Bredius, M. Bosticardo, M. Schmidt, C. von Kalle, E. Montini, A. Biffi, M. G. Roncarolo, L. Naldini, A. Villa, A. Aiuti, Preclinical safety and efficacy of human CD34(+) cells transduced with lentiviral vector for the treatment of Wiskott-Aldrich syndrome. Mol. Ther. 21, 175–184 (2013). doi:10.1038/mt.2012.23 Medline

23. R. L. Koul, A. Gururaj, A. P. Chacko, M. S. Elbualy, S. R. Bhusnurmath, P. Chand, Late infantile metachromatic leucodystrophy in two siblings. Indian Pediatr. 31, 694–698 (1994). Medline

24. S. Yatziv, A. Russell, An unusual form of metachromatic leukodystrophy in three siblings. Clin. Genet. 19, 222–227 (1981). doi:10.1111/j.1399-0004.1981.tb00702.x Medline

25. T. Satoh, H. Suzuki, N. Monma, R. Satodate, H. Tanaka, H. Yajima, Metachromatic leukodystrophy. Report of siblings with the juvenile type of metachromatic leukodystrophy. Acta Pathol. Jpn. 38, 1041–1051 (1988). Medline

26. A. Biffi, M. Cesani, F. Fumagalli, U. del Carro, C. Baldoli, S. Canale, S. Gerevini, S. Amadio, M. Falautano, A. Rovelli, G. Comi, M. G. Roncarolo, M. Sessa, Metachromatic leukodystrophy - mutation analysis provides further evidence of genotype-phenotype correlation. Clin. Genet. 74, 349–357 (2008). doi:10.1111/j.1399-0004.2008.01058.x Medline

27. C. Kehrer, G. Blumenstock, C. Raabe, I. Krägeloh-Mann, Development and reliability of a classification system for gross motor function in children with metachromatic leucodystrophy. Dev. Med. Child Neurol. 53, 156–160 (2011). doi:10.1111/j.1469-8749.2010.03821.x Medline

28. S. Groeschel, C. Kehrer, C. Engel, C. I Dali, A. Bley, R. Steinfeld, W. Grodd, I. Krägeloh-Mann, Metachromatic leukodystrophy: Natural course of cerebral MRI changes in relation to clinical course. J. Inherit. Metab. Dis. 34, 1095–1102 (2011). doi:10.1007/s10545-011-9361-1 Medline

Page 78: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

29. A. Biffi, C. C. Bartolomae, D. Cesana, N. Cartier, P. Aubourg, M. Ranzani, M. Cesani, F. Benedicenti, T. Plati, E. Rubagotti, S. Merella, A. Capotondo, J. Sgualdino, G. Zanetti, C. von Kalle, M. Schmidt, L. Naldini, E. Montini, Lentiviral vector common integration sites in preclinical models and a clinical trial reflect a benign integration bias and not oncogenic selection. Blood 117, 5332–5339 (2011). doi:10.1182/blood-2010-09-306761 Medline

30. U. Abel, A. Deichmann, C. Bartholomae, K. Schwarzwaelder, H. Glimm, S. Howe, A. Thrasher, A. Garrigue, S. Hacein-Bey-Abina, M. Cavazzana-Calvo, A. Fischer, D. Jaeger, C. von Kalle, M. Schmidt, Real-time definition of non-randomness in the distribution of genomic events. PLoS ONE 2, e570 (2007). doi:10.1371/journal.pone.0000570 Medline

31. G. P. Wang, C. C. Berry, N. Malani, P. Leboulch, A. Fischer, S. Hacein-Bey-Abina, M. Cavazzana-Calvo, F. D. Bushman, Dynamics of gene-modified progenitor cells analyzed by tracking retroviral integration sites in a human SCID-X1 gene therapy trial. Blood 115, 4356–4366 (2010). doi:10.1182/blood-2009-12-257352 Medline

32. F. Ginhoux, M. Greter, M. Leboeuf, S. Nandi, P. See, S. Gokhan, M. F. Mehler, S. J. Conway, L. G. Ng, E. R. Stanley, I. M. Samokhvalov, M. Merad, Fate mapping analysis reveals that adult microglia derive from primitive macrophages. Science 330, 841–845 (2010). doi:10.1126/science.1194637 Medline

33. J. Priller, A. Flügel, T. Wehner, M. Boentert, C. A. Haas, M. Prinz, F. Fernández-Klett, K. Prass, I. Bechmann, B. A. de Boer, M. Frotscher, G. W. Kreutzberg, D. A. Persons, U. Dirnagl, Targeting gene-modified hematopoietic cells to the central nervous system: Use of green fluorescent protein uncovers microglial engraftment. Nat. Med. 7, 1356–1361 (2001). doi:10.1038/nm1201-1356 Medline

34. A. Capotondo, R. Milazzo, L. S. Politi, A. Quattrini, A. Palini, T. Plati, S. Merella, A. Nonis, C. di Serio, E. Montini, L. Naldini, A. Biffi, Brain conditioning is instrumental for successful microglia reconstitution following hematopoietic stem cell transplantation. Proc. Natl. Acad. Sci. U.S.A. 109, 15018–15023 (2012). doi:10.1073/pnas.1205858109 Medline

35. K. Araya, N. Sakai, I. Mohri, K. Kagitani-Shimono, T. Okinaga, Y. Hashii, H. Ohta, I. Nakamichi, K. Aozasa, M. Taniike, K. Ozono, Localized donor cells in brain of a Hunter disease patient after cord blood stem cell transplantation. Mol. Genet. Metab. 98, 255–263 (2009). doi:10.1016/j.ymgme.2009.05.006 Medline

36. K. Allers, G. Hütter, J. Hofmann, C. Loddenkemper, K. Rieger, E. Thiel, T. Schneider, Evidence for the cure of HIV infection by CCR5Δ32/Δ32 stem cell transplantation. Blood 117, 2791–2799 (2011). doi:10.1182/blood-2010-09-309591 Medline

37. S. J. Howe, M. R. Mansour, K. Schwarzwaelder, C. Bartholomae, M. Hubank, H. Kempski, M. H. Brugman, K. Pike-Overzet, S. J. Chatters, D. de Ridder, K. C. Gilmour, S. Adams, S. I. Thornhill, K. L. Parsley, F. J. Staal, R. E. Gale, D. C. Linch, J. Bayford, L. Brown, M. Quaye, C. Kinnon, P. Ancliff, D. K. Webb, M. Schmidt, C. von Kalle, H. B. Gaspar, A. J. Thrasher, Insertional mutagenesis

Page 79: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

combined with acquired somatic mutations causes leukemogenesis following gene therapy of SCID-X1 patients. J. Clin. Invest. 118, 3143–3150 (2008). doi:10.1172/JCI35798 Medline

38. S. Hacein-Bey-Abina, C. von Kalle, M. Schmidt, F. Le Deist, N. Wulffraat, E. McIntyre, I. Radford, J. L. Villeval, C. C. Fraser, M. Cavazzana-Calvo, A. Fischer, A serious adverse event after successful gene therapy for X-linked severe combined immunodeficiency. N. Engl. J. Med. 348, 255–256 (2003). doi:10.1056/NEJM200301163480314 Medline

39. S. Hacein-Bey-Abina, C. Von Kalle, M. Schmidt, M. P. McCormack, N. Wulffraat, P. Leboulch, A. Lim, C. S. Osborne, R. Pawliuk, E. Morillon, R. Sorensen, A. Forster, P. Fraser, J. I. Cohen, G. de Saint Basile, I. Alexander, U. Wintergerst, T. Frebourg, A. Aurias, D. Stoppa-Lyonnet, S. Romana, I. Radford-Weiss, F. Gross, F. Valensi, E. Delabesse, E. Macintyre, F. Sigaux, J. Soulier, L. E. Leiva, M. Wissler, C. Prinz, T. H. Rabbitts, F. Le Deist, A. Fischer, M. Cavazzana-Calvo, LMO2-associated clonal T cell proliferation in two patients after gene therapy for SCID-X1. Science 302, 415–419 (2003). doi:10.1126/science.1088547 Medline

40. S. Stein, M. G. Ott, S. Schultze-Strasser, A. Jauch, B. Burwinkel, A. Kinner, M. Schmidt, A. Krämer, J. Schwäble, H. Glimm, U. Koehl, C. Preiss, C. Ball, H. Martin, G. Göhring, K. Schwarzwaelder, W. K. Hofmann, K. Karakaya, S. Tchatchou, R. Yang, P. Reinecke, K. Kühlcke, B. Schlegelberger, A. J. Thrasher, D. Hoelzer, R. Seger, C. von Kalle, M. Grez, Genomic instability and myelodysplasia with monosomy 7 consequent to EVI1 activation after gene therapy for chronic granulomatous disease. Nat. Med. 16, 198–204 (2010). doi:10.1038/nm.2088 Medline

41. D. Zychlinski, A. Schambach, U. Modlich, T. Maetzig, J. Meyer, E. Grassman, A. Mishra, C. Baum, Physiological promoters reduce the genotoxic risk of integrating gene vectors. Mol. Ther. 16, 718–725 (2008). doi:10.1038/mt.2008.5 Medline

42. D. J. Russell, P. L. Rosenbaum, D. T. Cadman, C. Gowland, S. Hardy, S. Jarvis, The gross motor function measure: A means to evaluate the effects of physical therapy. Dev. Med. Child Neurol. 31, 341–352 (1989). doi:10.1111/j.1469-8749.1989.tb04003.x Medline

43. R. G. Voigt, F. R. Brown, 3rd, J. K. Fraley, A. M. Liorente, J. Rozelle, M. Turcich, C. L. Jensen, W. C. Heird, Concurrent and predictive validity of the cognitive adaptive test/clinical linguistic and auditory milestone scale (CAT/CLAMS) and the Mental Developmental Index of the Bayley Scales of Infant Development. Clin. Pediatr. (Phila.) 42, 427–432 (2003). doi:10.1177/000992280304200507 Medline

44. C. Mattioli, M. Gemma, C. Baldoli, M. Sessa, A. Albertin, L. Beretta, Sedation for children with metachromatic leukodystrophy undergoing MRI. Paediatr. Anaesth. 17, 64–69 (2007). doi:10.1111/j.1460-9592.2006.02002.x Medline

45. S. Martino, A. Consiglio, C. Cavalieri, R. Tiribuzi, E. Costanzi, G. M. Severini, C. Emiliani, C. Bordignon, A. Orlacchio, Expression and purification of a human,

Page 80: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

soluble Arylsulfatase A for Metachromatic Leukodystrophy enzyme replacement therapy. J. Biotechnol. 117, 243–251 (2005). doi:10.1016/j.jbiotec.2005.01.018 Medline

46. G. V. Childs, In situ hybridization with nonradioactive probes. Methods Mol. Biol. 123, 131–141 (2000). Medline

47. A. Aiuti et al., Science 10.1126/science.1233151 (2013).

48. A. Paruzynski, A. Arens, R. Gabriel, C. C. Bartholomae, S. Scholz, W. Wang, S. Wolf, H. Glimm, M. Schmidt, C. von Kalle, Genome-wide high-throughput integrome analyses by nrLAM-PCR and next-generation sequencing. Nat. Protoc. 5, 1379–1395 (2010). doi:10.1038/nprot.2010.87 Medline

49. M. Cavazzana-Calvo, E. Payen, O. Negre, G. Wang, K. Hehir, F. Fusil, J. Down, M. Denaro, T. Brady, K. Westerman, R. Cavallesco, B. Gillet-Legrand, L. Caccavelli, R. Sgarra, L. Maouche-Chrétien, F. Bernaudin, R. Girot, R. Dorazio, G. J. Mulder, A. Polack, A. Bank, J. Soulier, J. Larghero, N. Kabbara, B. Dalle, B. Gourmel, G. Socie, S. Chrétien, N. Cartier, P. Aubourg, A. Fischer, K. Cornetta, F. Galacteros, Y. Beuzard, E. Gluckman, F. Bushman, S. Hacein-Bey-Abina, P. Leboulch, Transfusion independence and HMGA2 activation after gene therapy of human β-thalassaemia. Nature 467, 318–322 (2010). doi:10.1038/nature09328 Medline

50. A. Deichmann, S. Hacein-Bey-Abina, M. Schmidt, A. Garrigue, M. H. Brugman, J. Hu, H. Glimm, G. Gyapay, B. Prum, C. C. Fraser, N. Fischer, K. Schwarzwaelder, M. L. Siegler, D. de Ridder, K. Pike-Overzet, S. J. Howe, A. J. Thrasher, G. Wagemaker, U. Abel, F. J. Staal, E. Delabesse, J. L. Villeval, B. Aronow, C. Hue, C. Prinz, M. Wissler, C. Klanke, J. Weissenbach, I. Alexander, A. Fischer, C. von Kalle, M. Cavazzana-Calvo, Vector integration is nonrandom and clustered and influences the fate of lymphopoiesis in SCID-X1 gene therapy. J. Clin. Invest. 117, 2225–2232 (2007). doi:10.1172/JCI31659 Medline

51. K. Akagi, T. Suzuki, R. M. Stephens, N. A. Jenkins, N. G. Copeland, RTCGD: Retroviral tagged cancer gene database. Nucleic Acids Res. 32, D523–D527 (2004). doi:10.1093/nar/gkh013 Medline

52. T. Suzuki, H. Shen, K. Akagi, H. C. Morse, J. D. Malley, D. Q. Naiman, N. A. Jenkins, N. G. Copeland, New genes involved in cancer identified by retroviral tagging. Nat. Genet. 32, 166–174 (2002). doi:10.1038/ng949 Medline

53. J. de Ridder, A. Uren, J. Kool, M. Reinders, L. Wessels, Detecting statistically significant common insertion sites in retroviral insertional mutagenesis screens. PLOS Comput. Biol. 2, e166 (2006). doi:10.1371/journal.pcbi.0020166 Medline

54. K. Schwarzwaelder, S. J. Howe, M. Schmidt, M. H. Brugman, A. Deichmann, H. Glimm, S. Schmidt, C. Prinz, M. Wissler, D. J. King, F. Zhang, K. L. Parsley, K. C. Gilmour, J. Sinclair, J. Bayford, R. Peraj, K. Pike-Overzet, F. J. Staal, D. de Ridder, C. Kinnon, U. Abel, G. Wagemaker, H. B. Gaspar, A. J. Thrasher, C. von Kalle, Gammaretrovirus-mediated correction of SCID-X1 is associated with skewed vector integration site distribution in vivo. J. Clin. Invest. 117, 2241–2249 (2007). doi:10.1172/JCI31661 Medline

Page 81: Supplementary Materials for · 2013. 8. 26. · Kalle, Claudio Bordignon, Fabio Ciceri, Attilio Rovelli, Maria Grazia Roncarolo, Alessandro Aiuti, Maria Sessa, Luigi Naldini *Corresponding

55. K. Boztug, M. Schmidt, A. Schwarzer, P. P. Banerjee, I. A. Díez, R. A. Dewey, M. Böhm, A. Nowrouzi, C. R. Ball, H. Glimm, S. Naundorf, K. Kühlcke, R. Blasczyk, I. Kondratenko, L. Maródi, J. S. Orange, C. von Kalle, C. Klein, Stem-cell gene therapy for the Wiskott-Aldrich syndrome. N. Engl. J. Med. 363, 1918–1927 (2010). doi:10.1056/NEJMoa1003548 Medline

56. A. Chao, An overview of closed capture-recapture models. J. Agric. Biol. Environ. Stat. 6, 158–175 (2001). doi:10.1198/108571101750524670

57. S. P. McDermott, K. Eppert, E. R. Lechman, M. Doedens, J. E. Dick, Comparison of human cord blood engraftment between immunocompromised mouse strains. Blood 116, 193–200 (2010). doi:10.1182/blood-2010-02-271841 Medline

58. M. Bertelli, S. Gallo, A. Buda, S. Cecchin, A. Fabbri, C. Lapucci, G. Andrighetto, V. Sidoti, L. Lorusso, M. Pandolfo, Novel mutations in the arylsulfatase A gene in eight Italian families with metachromatic leukodystrophy. J. Clin. Neurosci. 13, 443–448 (2006). doi:10.1016/j.jocn.2005.03.039 Medline

59. R. Draghia, F. Letourneur, C. Drugan, J. Manicom, C. Blanchot, A. Kahn, L. Poenaru, C. Caillaud, Metachromatic leukodystrophy: Identification of the first deletion in exon 1 and of nine novel point mutations in the arylsulfatase A gene. Hum. Mutat. 9, 234–242 (1997). doi:10.1002/(SICI)1098-1004(1997)9:3<234::AID-HUMU4>3.0.CO;2-7 Medline

60. D. J. Russell, P. L. Rosenbaum, L. M. Avery, M. Lane, Gross Motor Function Measure (GMFM-66 & GMFM-88) User's Manual, M. K. Press, Ed., (Clinics in Developmental Medicine, Cambridge Univ. Press, London, ed. 1, 1997).