Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Supplementary Materials for
Tumor-derived circulating endothelial cell clusters in colorectal cancer
Igor Cima, Say Li Kong, Debarka Sengupta, Iain B. Tan, Wai Min Phyo, Daniel Lee,
Min Hu, Ciprian Iliescu, Irina Alexander, Wei Lin Goh, Mehran Rahmani, Nur-Afidah
Mohamed Suhaimi, Jess H. Vo, Joyce A. Tai, Joanna H. Tan, Clarinda Chua, Rachel Ten,
Wan Jun Lim, Min Hoe Chew, Charlotte A.E. Hauser, Rob M. van Dam, Wei-Yen Lim,
Shyam Prabhakar, Bing Lim, Poh Koon Koh, Paul Robson, Jackie Y. Ying,
Axel M. Hillmer, Min-Han Tan*
*Corresponding author. Email: [email protected]
Published 29 June 2016, Sci. Transl. Med. 8, 345ra89 (2016)
DOI: 10.1126/scitranslmed.aad7369
The PDF file includes:
Materials and Methods
Fig. S1. Device setup, retrieval efficiency, and sample output purity.
Fig. S2. scrmPCR methodology and proof of principle in cell lines.
Fig. S3. CD45− clusters express EMT markers, do not share common mutations
with the primary tumor, and have normal chromosomal structures.
Fig. S4. Cytomorphology of CD45− clusters used for RNA-seq.
Fig. S5. Cell-type inference workflow from RNA-seq data.
Fig. S6. Validation of the cell-type inference algorithm.
Fig. S7. Comparison between endothelial clusters (this study) and CTC clusters in
Aceto et al. (9).
Fig. S8. Circulating endothelial clusters are tumor-derived.
Fig. S9. tEC clusters in a mouse model of CRC.
Fig. S10. tEC clusters do not correlate with selected CRC tumor and patient
characteristics or comorbidities.
Legends for tables S1 to S14
References (38–44)
Other Supplementary Material for this manuscript includes the following:
(available at www.sciencetranslationalmedicine.org/cgi/content/full/8/345/345ra89/DC1)
Table S1 (Microsoft Excel format). scrmPCR primers used in this study.
www.sciencetranslationalmedicine.org/cgi/content/full/8/345/345ra89/DC1
Table S2 (Microsoft Excel format). Whole genome application (WGA) false-
positives are detected only at VAF < 10%.
Table S3 (Microsoft Excel format). Circulating CD45− clusters do not mirror
matching primary tumor mutations.
Table S4 (Microsoft Excel format). Sporadic mutations in circulating CD45−
clusters are not detected in matching primary tumor tissues.
Table S5 (Microsoft Excel format). Germline variants in tumor tissues and
circulating CD45− cell clusters (coverage and zygosity).
Table S6 (Microsoft Excel format). RNA-seq data: number of uniquely mapped
reads to hg19 exons.
Table S7 (Microsoft Excel format). RNA-seq data: processed data (FPKM) of all
samples described in the study.
Table S8 (Microsoft Excel format). Definition of the cell types used for the
inference algorithm.
Table S9 (Microsoft Excel format). Top 80 genes ordered from highest to lowest
specificity index (S) for each cell type selected for the inference algorithm.
Table S10 (Microsoft Excel format). Positive published RNA-seq control samples
used to validate the inference algorithm.
Table S11 (Microsoft Excel format). Characteristics of tEC clusters isolated in
this study.
Table S12 (Microsoft Excel format). Circulating endothelial cell cluster count
before and after surgery.
Table S13 (Microsoft Excel format). Baseline patients’ and healthy donors’
characteristics.
Table S14 (Microsoft Excel format). CD45− cluster count for each baseline
sample type and number of single clusters analyzed for each technique in the
corresponding samples.
Materials and Methods
Clinical samples processing and data collection
All samples were collected in EDTA Vacutainer tubes (Becton-Dickinson), and processed within
6 hr at IBN. Two CRC cases were excluded from the analysis because of a technical issue with
the microfiltration. Wherever available, matched tumor and metastatic samples, along with
normal mucosa sampled at least 5 cm away from the tumor, were immediately frozen after
resection, and stored at −80C until use. Clinicopathologic data for participating subjects are
described in table S13, and were collected retrospectively after completion of circulating clusters
counts. Furthermore, table S14 includes circulating clusters counts for each baseline sample and
includes number of single clusters analyzed for each technique and patient throughout the study.
Clinical data was collected without prior knowledge of clusters counts. Similarly, clinical data
for CRC patients were not known when determining clusters counts, except for the diagnosis and
preoperative status of all FSH samples.
Device assembly
The retrieval device was assembled as follows. Firstly, the rubber plug of a 3-ml syringe plunger
was removed, and a 5.5 mm-diameter hole was created using a punch cutter. The perforated
rubber plug was placed in the 3-ml syringe. Next, an O-ring was placed in the slot of the sleeve
insert, followed by the microsieve and another O-ring. Finally, the sleeve insert, together with
the microsieve and O-rings were placed in the 3-ml syringe above the perforated rubber plug.
This arrangement enabled the microfiltration of cells by size from the whole blood, and the
subsequent retrieval of captured cells from the upper surface of the microsieve.
Microfiltration optimization
To optimize blood microfiltration with microsieve, cells labeled with 5 µM CellTracker (Life
Technologies) were added to donor blood at 10–50 cells per ml of whole blood. Blood was
microfiltered at various flow rates by means of a peristaltic pump (Ismatec). After 6 washes with
PBS, 0.5% BSA and 2 mM EDTA, cells were resuspended in culture medium. Cell nuclei were
then stained using Hoechst 33342 (Life Technologies), and cells were retrieved to determine
retrieval efficiency and fold-depletion of contaminating WBCs. In some experiments, we also
counted CellTracker-positive cells remaining on the microsieve. Percent retrieval efficiency was
calculated as follows:
% Retrieval Efficiency = (Retrieved cells) × 100 / (Spiked Cells)
Fold depletion from 1 ml whole blood was calculated as follows:
Fold Depletion = (WBCs in Whole Blood) / (WBCs in Microfiltrate)
WBC count in the microfiltrate was defined as the number of any Hoechst 33342
positive, CellTracker-negative event in the case of experimental enrichment, or as any CD45-
positive event in the case of clinical samples. All clinical samples were immediately processed in
downstream applications using the optimized parameters described in Figure 1C and D. To
estimate the ideal target WBC depletion, we performed single-cell micromanipulation on serial
dilution of PBMCs containing 50 CellTracker-positive HCT 116 cells. Micromanipulation of
pure HCT 116 cells without contaminating white blood cells was easily performed after a 5,000-
fold dilution of PBMCs. The ideal target retrieval efficiency was based on a literature search of
label-free CTC isolation devices (17). Microfiltration of clinical samples was performed using 2
ml of whole blood for each microsieve device and the optimized microfiltration conditions.
Single cell and single cluster micromanipulation
Cells and clusters were identified using bright field or phase contrast imaging, nuclear staining
and specific fluorescence signals. Target single cells or clusters were then manually
micropipetted in a 10-µl droplet of wash buffer using a mouth pipette operated through a 25-ml
syringe. Subsequently, single cells or clusters were deposited in 0.2-ml PCR tubes containing an
appropriate buffer: 5 µl of 2× Reaction buffer (CellsDirect One-Step qRT-PCR Kit, Life
Technologies) for scrmPCR; 2 µl of PBS for whole genome amplification; or 2 µl of SuperBlock
buffer (Thermo Scientific) for low-input RNA-seq. Cells were immediately stored at −80 C
until use. In some cases, the complete microfiltrate was spun down without micromanipulation
and stored at −80 C until further use.
Cell lines and culture
HCT 116, COLO 201, SW480, SW620, DLD-1, RKO, and CT26 colorectal cancer cell lines and
BJ-5ta immortalized human foreskin fibroblasts were purchased from ATCC and cultured in
DMEM (Life Technologies) supplemented with 10% FBS. HUVECs were purchased from
ATCC, used at passage 1 to 3, and cultured in EGM-2 medium (Lonza). HDMECs and HPrECs
were purchased from ScienCell, cultured in Endothelial Cell Medium (ScienCell) on fibronectin-
coated dishes, and used at passage 1 for RNA-seq analysis and at passage 5 for the FOLH1
analysis. iPSCs were donated by Y. Chouduri (Institute of Bioengineering and Nanotechnology).
All cells were maintained in a humidified incubator at 37 C in the presence of 5% CO2.
Reagents for on-sieve immunofluorescence
Following fluorescence-labeled antibodies were used: anti-CD45 1:200 (clone 2D1;
eBioscience), anti-EpCAM 1:20 (9C4, BioLegend), anti-CD31 1:20 (WM59, BioLegend), anti-
CD144 1:10 (55-7H1, BD Biosciences), anti-CD41 1:20 (HIP8, BioLegend), anti-CD42B 1:20
(HIP1, BioLegend), anti-CD133 1:20 (AC133, BD Biosciences). Live iPSCs in suspension were
used as positive controls for the CD133 staining. Fibrin was stained using anti-Fibrin 1:200
(Clone UC45, Pierce antibodies) and secondary goat anti-mouse IgG/IgM conjugated to Alexa
488 1:1000 (Pierce antibodies). Fibrin-positive controls included fibrin gels obtained by mixing
20 µl of fibrinogen droplets (20 mg/ml in PBS, pH = 7.2, Sigma-F3879) deposited on a well of a
96-well plate, mixed with 2 µl of thrombin (50 U/ml, Sigma-T4648) and incubated for 10 min at
room temperature. For intracellular antigens, the Inside Stain Kit (Miltenyi Biotec) comprising a
fixative and a permeabilization solution was used together with human FcR Blocking Reagent
with the following antibodies: anti-VWF 1:200 (rabbit polyclonal A 0082, DAKO, conjugated
in-house to Alexa 488 or Alexa 555 using Life Technologies APEX Antibody Labelling Kit),
anti-Vimentin (V9, Santa Cruz Biotechnology), and anti-pan Cytokeratin (C11, Cell Signaling
Technology). Nuclei were stained using Hoechst 33342 (Life Technologies). In some
experiments, Calcein AM (Life Technologies) was used to identify live cells.
Nucleic acid extraction
Total RNA from tissues was isolated using the RNeasy mini kit (Qiagen). DNA from tissues was
isolated using the DNeasy mini kit (Qiagen). Complete microfiltrates or laser-capture
microdissected (LCM) samples were subjected to RNA extraction using the RNAqueous-Micro
Total RNA Isolation Kit (Ambion) following the manufacturer’s instructions. Single cells and
single clusters did not undergo nucleic acid extraction, but were lysed and used directly in
downstream assays as described for the scrmPCR method, DNA sequencing and aCGH and
RNA-seq.
scrmPCR
This method can simultaneously detect and quantify RNA transcripts and sequence DNA
hotspots in the same cell. Briefly, single-cell RNA transcripts were reverse transcribed at 50 C
for 30 min using SuperScript III Reverse Transcriptase (Invitrogen) and a mix of 500 nM target
reverse primers. A preamplification round was then performed using Platinum Taq DNA
polymerase (Invitrogen) by adding a matching mix of forward primers to the transcript-specific
reverse primers and primer pairs for targeted genomic regions. Preamplification cycling was
conducted by alternating annealing and denaturation steps without extension as follows: 6 cycles
at 60 C, 4 min, 95 C, 1 min; 6 cycles at 55 C, 4 min, 95 C, 1 min; 6 cycles at 50 C, 4 min,
95 C, 1 min. In some cases, we used 18 cycles at 60 C, 4 min, 95 C, 1 min. Primer clean-up
was performed using the Axyprep PCR Clean-up Kit (Axygen) or using Exonuclease I
enzymatic digestion (Thermo Fisher). Samples were diluted 1/20 and stored at −20 C until
further use.
To quantify RNA transcripts, quantitative PCR was performed on a ViiA7 Instrument
(Applied Biosystems) using 2 µl of the preamplification mixture, semi-nested primer pairs
appropriate for the target transcript (table S1) and the SensiFAST SYBR Lo-ROX Kit (Bioline)
according to the manufacturer’s protocol. Relative gene expression was normalized using ACTB
as the reference gene. Single cells or single clusters that did not show ACTB expression
quantifiable in < 25 cycle threshold (Ct) values were discarded from the analysis. To analyze
selected DNA mutational hotspots, PCR was performed using 2 µl of the preamplification
mixture, nested PCR primer pairs (table S1) and a master mix containing a proof-reading
polymerase (KOD Hot Start Master Mix, EMD Millipore) in line with the manufacturer’s
instructions. For KRAS exon 2 sequencing of tumor and normal tissues shown in Fig. 2C, PCR
amplification was performed using a primer pair different from the ones indicated in table S1):
TTTGTATTAAAAGGTACTGGTGGAG as forward primer, and
CCTTTATCTGTATCAAAGAATGGTC as reverse primer. PCR products were separated on
agarose gel; specific bands were excised and sequenced using the Sanger method.
Droplet digital PCR (ddPCR)
ddPCR was performed using the BioRad QX200 system and manufacturer instructions. Briefly,
primers and probes were designed to target variants detected in three clusters from three different
patients (fig. S3, B and C). For each sample, the PCR reaction was performed in duplicate using
450 nM forward and reverse primers, 250 nM probes targeting the mutant and the wild type
alleles, 1x of ddPCR Supermix and 5ng of HindIII-digested DNA template using the following
thermal cycling protocol: 95°C for 10 min; 40 cycles at 94°C, 30 s and 63°C, 1 min; 98°C for 10
min. Results were analyzed on the BioRad QX200 Droplet Reader using the QuantaSoft
software. Following primers and probes were used: AKT c.G27A: Forward primer
TACTACCCCCATCTCTCCCT, reverse primer CTGAGAGGAGCGCGTGAG, probe 1
(mutant) FAM-ACCTCGTTTGTGCAGCCAACCTTCCT-BLACK HOLE, probe 2 (wildtype)
HEX-ACCTCGTTTGTGCAGCCAACCCTCCT-BLACK HOLE; KRAS c.G196A: Forward
primer ATACACAAAGAAAGCCCTCC, reverse primer CAGACTGTGTTTCTCCCTTC,
probe 1 (mutant) FAM-CCCTCATTGTACTGTACTCCTCTTGACCTG-BLACK HOLE, probe
2 (wild type) HEX-CCCTCATTGCACTGTACTCCTCTTGACCTG-BLACK HOLE; NRAS
c.G97A: Forward primer TGGGTAAAGATGATCCGACA, reverse primer GACTGAGTACA
AACTGGTGG, probe 1 (mutant) FAM-CTGGGCCTCACCTCTATGGTGGGATTATAT-
BLACK HOLE, probe 2 (wild type) HEX-CTGGGCCTCACCTCTATGGTGGGATCATAT-
BLACK HOLE.
EPC assay
Colony-forming EPC assays were performed as previously described (22). Live endothelial
clusters were counted in 2-ml microfiltrates after CD144 and Calcein AM fluorescence staining.
Unstained microfiltrates from 2 ml of blood from a second device were then placed in culture on
a 96-well plate coated with fibronectin (1 µg/cm2) (Sigma-Aldrich) in the presence of EGM-2
cell culture medium (Lonza). The presence of clusters was confirmed by bright field microscopy
before incubation. HUVECs were used as positive controls after spiking 10,000 HUVECs into 2
ml of donor blood and isolating them by microfiltration using two microsieve devices. In one
device, retrieved HUVECs were quantified by CD144 and Calcein AM staining. HUVECs
retrieved from the other device were seeded at 5, 10, 20, 40, 80, or 160 cells in octuplicate wells.
After 2 days, the medium was changed and cell cultures were maintained for a total of 30 days,
with half of the medium changed every other day. The presence and viability of colonies were
monitored every week under bright field microscopy. After 30 days, cells were stained using
CD144 antibodies, Calcein AM and Hoechst 33342, and quantified under an IX81 (Olympus)
inverted fluorescence microscope.
D-dimer ELISA
D-dimer ELISA was performed using a RayBio human D-dimer ELISA kit (Raybiotech) with
manufacturer instructions. Plasma samples were diluted 1:2,000,000 in the provided buffer and
incubated overnight at 4°C.
Isolation of endothelial cells from fresh tissues
Tissues were minced before digestion with collagenase, dispase and DNAseI for 60 min at 37C.
After Ficoll-Paque density centrifugation, a two-step magnetic selection was performed using
MACS reagents and materials (Miltenyi Biotec) in line with the manufacturer’s instructions.
First, CD45-expressing cells were depleted by retention on LD columns after labeling the cells
with anti-CD45 magnetic beads and human FcR Blocking Reagent. Next, the CD45-depleted
fraction was collected and re-labeled by adding anti-CD31 magnetic beads and human FcR
Blocking Reagent. A fraction enriched for CD31+CD45– cells was harvested on MS columns
before elution and storage at −80 C until further use.
Laser capture microdissection of endothelial cells from archival frozen tissues
Eight-µm tissue sections were cut from fresh frozen samples of tumor and matched normal
tissues, and placed on PEN-LCM membrane slides (Thermo Fisher). Slides were washed twice
in buffer (PBS/EDTA/1%BSA/RNAse inhibitor). After a 10-min peroxidase blocking step using
Peroxidase Blocking Reagent (DAKO) and three washes in buffer, slides were stained for 20 min
with anti-CD31 antibodies at 1 µg/ml in buffer (WM59, BioLegend). After three washing steps,
bound antibodies were visualized using the REAL EnVision Detection System (Dako Cat. 5007)
in line with the manufacturer’s instructions. Laser capture microdissection was immediately
performed on the dry slides using the Arcturus LCM system (Thermo Fisher) following
manufacturer’s instructions. Endothelia were captured on adhesive CapSure caps (Thermo
Fisher) and stored at −80°C until further use. RNA was isolated using RNAqueous-Micro Total
RNA Isolation Kit (Ambion) and stored at −80C until further use. Low-input RNA-seq libraries
and sequencing were performed as described using 10 pg of RNA from each sample.
Cell type inference method
First, we generated a map of specific genes for each cell type of interest. The primary cell atlas
data set (GSE49910) (38) was used for this purpose. From this data set we extracted 215
experiments of interest, corresponding to n = 36 different cell types or ‘lineages’ (table S8)
without reprocessing of the data. Technical replicates were averaged. For each gene, g, in each
lineage, l, a ‘specificity index’, 𝑆, was calculated based on Shannon entropy and the Q statistics
introduced by (39),
𝑆(𝑙|𝑔) = − ∑ 𝑝(𝑙|𝑔)
𝑁
𝑙=1
⋅ log2(𝑝(𝑙|𝑔)) − log2(𝑝(𝑙|𝑔))
where 𝑝(𝑙|𝑔) is the relative expression of gene g in lineage l. Gene specificity was confirmed by
visualizing gene expression data with high specificity indices using the primary cell line data
base from BioGPS (40) (this data base is also dervied from GSE49910) (fig. S6A). For each cell
type, the top 80 genes with the highest specificity index (‘specific genes’) were selected and
were reported in table S9, which represents the map of specific genes for each lineage of interest.
Next, for each RNA-seq query sample, we set a predefined threshold to define expressed genes.
In our dataset, we used FPKM > 0 as expressed genes. For the positive controls in figs. S6 and
S7, we used the cut-offs reported in the corresponding studies. Next, in the query RNA-seq list
of expressed genes, we counted the occurrences of the top 80 specific genes for each cell type
present in our map of specific genes (table S9). To determine if the number of enriched genes
was different from enrichment by chance, we generated 1,000 lists of 80 randomly selected
genes from the Affymetrix HG-U133_Plus_2 gene symbol list (‘random genes’) and counted the
average number of genes present by chance in each experimental RNA-seq profile for each
lineage. Lastly, for each cell type in each experimental sample, a Fisher’s exact test was applied
to determine whether the number of enriched specific genes was equal to the number of
randomly enriched genes. The odds ratios for each test were mean-centered, scaled and
visualized in a heat map comprising all tested cell types. The final results were used to generate
hypotheses of cell types based on the distribution of the normalized odds ratios. The algorithm
was validated using lists of expressed genes from published RNA-seq data from various cell
types and tissues (fig. S6C).
Principal component analysis on RNA-seq data
Principal component analysis (PCA) was performed in the R environment (version 3.1.0) (32) on
RNA-seq data set of single clusters and tissue samples (table S7). The prcomp function was used
with Log2(FPKM+1) data and with default parameters after excluding variables (genes) with 0
standard deviation and quantile normalization of the samples.
Differential gene expression (DGE) and linear discriminant analysis (LDA) of RNA-seq
data
DGE analysis and LDA of RNA-seq data from laser-dissected nECs and tECs was performed by
‘NOISeq’ (41) and the lda function of the MASS package respectively, in the R environment
(version 3.1.0). For DGE, input data were log2(FPKM+1) normalized before analysis. To obtain
differentially expressed genes, we applied the noiseqbio function with default parameters, except
for using r = 100 permutations and setting the smoothing parameter adj to 0.5. The list of genes
presented in fig. S8D was selected using NOISeq probability >0.8 (corresponding to a q-value
<0.2) and no filter on the log2 fold change (log2FC) data. For LDA, the data corresponding to the
list of differentially expressed genes in the LCM samples were extracted from the log2
normalized RNA-seq dataset. Variables with >80% missing values [log2(FPKM+1) = 0] in the
clusters data were excluded from the analysis (CDH2 and ACAT). A linear discriminant model
using the remaining 6 variables, ST6GALNAC2, WDR36, ARMC10, PSMB2, STOM, and
NUCKS1 was computed on the nECs and tECs data using the lda function of the ‘MASS’
package, after verifying the absence of multicollinearity (Pearson’s r < 0.8 for each variable pair)
and the assumptions of normality and homogeneity of the covariance matrices by the Box’s M
test (P = 0.55). The model accuracy was tested using leave-one-out cross-validation (LOOCV)
on the same data set and was reported in Fig. 4A. Finally, the model was applied to classify each
single cluster. To calculate the expected probability of cluster classification by chance, we
generated a series of 1000 random 6-gene signatures with Box’s M test P values > 0.1 and
identical missingness probabilities, and tested each signature using the workflow described for
the 6-gene signature. The exact binomial test was used to compare the observed probability (Po)
of classified clusters as tECs with the mean expected probability (Pe) of the randomly generated
signatures classified as tECs.
Microvessel analysis and lumen count
Briefly, after scanning of the tissue sections using an Ariol SL-50 scanning platform Leica
Biosystems, the tumor regions for all sections were marked and quantified. Within these regions,
all areas with the greatest number of distinctly highlighted micro-vessels (hotspots) were marked
and quantified. Next, nine images covering all hotspot areas were analyzed at 20× magnification
for microvessel density (MVD), lumen counts and lumen size. MVD was determined using an
automated ImageJ macro as follows. Ruifrok’s color deconvolution was applied to extract the
CD31-positive signal. After image binarization, a common threshold level was chosen to
represent correct vascular morphology and minimize background levels across all slides.
Morphological post-processing steps were applied to join nearby objects, split objects and close
gaps within objects similarly as described (36). Objects larger than a minimal size (default: 105
pixels, corresponding to 50 µm2) were counted as microvessels. Smaller size objects were
considered as noise and excluded from the analysis. The ImageJ macro was validated by
comparing it with manually counted images (Pearson’s r = 0.951, n = 30 images). Lumen count
and perimeters were determined manually for each image using ImageJ measurement and count
features. Mosaic and damaged vessels were counted in the tumor area and in available normal
tissues present on the same slide by screening each section using a 20× magnification. False-
positive stains (e.g. hematopoietic cells positive for CD31) were excluded from the analysis.
These cells were identified in each tissue by morphology and lower staining intensity compared
to microvessels. Mean values of 9 images per section were used for statistical assessments.
Patient’s IDs and clusters counts were blinded during the analysis to avoid subjective bias during
acquisition of the data.
Tail vein injection of tumor-derived endothelial cells in experimental mice. CT26 cells
(500,000 in 100 µl of sterile PBS) were subcutaneously injected into the left flank of 7-week-old
Balb/c mice. At 14 days after injection, mice were euthanized and the tumors (about 2 cm-
diameter) were isolated for downstream processing. Tumor samples were embedded in optimal
cutting temperature compound (OCT, Sakura Finetek). The remaining tumor tissues were
minced using scalpel blades and digested with a solution of collagenase and dispase (Liberase
DH, Roche) for 2 hr at 37 C in cell culture medium with occasional stirring. After two washing
steps, the tissues were incubated with DNase I (Thermo Fisher) for 15 min, and filtered through a
40-µm cell strainer to obtain single-cell suspensions. Cells were stained for 45 min at 4 C using
1 µg/ml of allophycocyanin (APC)-labeled anti-CD309 antibodies (Avas12a1, eBioscience),
diluted in PBS/EDTA/1%BSA or matching Rat IgG2a K Isotype control (eBioscience), in the
presence of anti-mouse FC Receptor Block (eBioscience). After two washing steps, cells were
stained with Propidium Iodide (PI) Readyprobes (Thermo Fisher). Cell suspensions were then
sorted using a MoFlo XDP Cell Sorter to obtain a total of 500,000 CD309+PI- single cells. Cells
were subsequently divided into two equal fractions. One fraction was labeled using Celltracker
Orange (Invitrogen), while the other fraction was labeled using Celltracker Green (Invitrogen).
Cell suspensions were mixed and 50,000 cells were injected into the tail vein of nine 7-week-old
Balb/c mice. Mice were euthanized at the indicated time points, and the presence of single
fluorescence events (single cells) or dual fluorescence events (clusters) were analyzed in blood
samples by a FACS Calibur instrument (BD Biosciences), after lysis of the erythrocytes fraction
using BD Pharm Lyse (BD Biosciences) and following manufacturer’s instructions. The acquired
data were analyzed using FlowJo version 10.0.7 (Tree Star).
Tumor-derived endothelial cell clusters in experimental mice. CT26 cells (500,000 in 100 µl
of sterile PBS) were subcutaneously injected into the left flank of two 6-week-old
L2G85.BALB/c mice (The Jackson Laboratory). This mouse strain carries the GFP transgene
under the ubiquitous CAG promoter. After 14 days, mice were euthanized and tumors were
subcutaneously transplanted into five wild-type 6-week-old Balb/c mice. Fourteen days later, the
mice were euthanized, and the blood samples were immediately collected in EDTA Vacutainer
tubes (Becton-Dickinson) by cardiac puncture. Tumor tissues were collected and dissected in
three segments. Each segment was tested for the presence of the GFP transgene by genotyping
its genomic DNA with the recommended protocols and the following primers: oIMR0872
AAGTTCATCTGCACCACCG as GFP forward primer, oIMR1416
TCCTTGAAGAAGATGGTGCG as GFP reverse primer, oIMR7338
CTAGGCCACAGAATTGAAAGATCT as internal positive control forward primer, and
oIMR7339 GTAGGTGGAAATTCTAGCATCATCC as internal positive control reverse primer.
One mouse was excluded from subsequent analysis because of the absence of GFP signal in any
one of the three segments. Blood samples were processed within 2 h of collection. Erythrocytes
were lysed using BD Pharm Lyse (BD Biosciences) and following manufacturer’s instructions.
Cells were washed in buffer (PBS/EDTA/1%BSA), and stained using 1 µg/ml of anti-CD309-
APC (Avas12a1, eBioscience) or matching rat IgG2a K Isotype control (eBioscience) and
Hoechst 33342, in the presence of anti-mouse FC Receptor Block (eBioscience). Stained cells
were analyzed using a LSR II instrument (BD Biosciences) with independent laser sources for
each fluorescent signal. Acquired data were analyzed using FlowJo version 10.0.7 (Tree Star).
Blood from three control Balb/c mice was used as negative control.
Supplementary Figures:
Fig. S1. Device setup, retrieval efficiency, and sample output purity. (A) Device setup.
Microfiltration devices enclosing a silicon microsieve (inset scale bar, 10 µm), connected to a
peristaltic pump (B) Capture efficiency of SW620 cells from whole blood, indicating percentage
of captured cells on the microsieve that could be retrieved for downstream assays, or that were
lost because the cells adhered to the microsieve. Four independent experiments are shown. (C)
Size distribution of SW620, HCT 116, SW480, and COLO201 cells (n = 50 cells each). Arrows
indicate median size of WBCs and CTCs isolated from colorectal, prostate and breast cancer
patients respectively, reported in (14). (D) Contaminating nucleated cells after microfiltration of
clinical samples using the optimized protocol shown in Fig. 1, B and C (n = 13 blood samples).
(E) Retrieval efficiency from additional cell lines using optimized protocol shown in Fig. 1, B
and C. Data are means ± SEM (n = 3 experiments for each cell line).
Fig. S2. scrmPCR methodology and proof of principle in cell lines. (A) scrmPCR workflow
for single cells or single clusters. (B) Concurrent DNA amplification (gel electrophoresis) and
RNA quantitation (heat map) for the indicated genes and representative single or bulk cells. (C)
Sequence chromatograms from DNA hotspots of known mutant alleles in RKO and DLD-1
single cells. (D) Melting curve plots for the indicated genes from qPCR reactions. sc, single cell;
NTC, no-template control; Rn, normalized relative to reporter signal.
TP53 (exon5)
TP53 (exon6)
TP53 (exon7)
TP53 (exon8)
BRAF (exon15)
DL
D1
B
DL
D1
SC
1
DL
D1
SC
2
DL
D1
SC
3
NT
CCDH1
SERPINE1
EPCAM
K
CDX2
DL
D1
SC
3
RK
RK
RK
RK
KRAS (exon2)
NT
C
DLD-1 RKO
Bu
lk
sc#
1
sc#
2
sc#
3
Bu
lk
sc#
1
sc#
3
sc#
2
CDH1
SERPINE1
EPCAM
KRT19
CDX2
SE
RP
INE
1
20
10
0 10
20
Relative gene expression
(normalized to ACTB)
-20 0 -10 10 20
bulk
sc#
1
RKO
BRAFmut DLD-1
KRASmut TP53mut
sc#
1
bu
lk
#"%!"#!"%$"#
KRT19 CDH1 ACTB
CDX2 EPCAM
Temperature (°C)
-Rn
’ -R
n’
AG AGA GGT GAG TTTCT
SERPINE1
A
Single-cell/cluster
Gene
expression
DNA
mutations
QPCR Proof-reading PCR
& Sanger seq.
On
e-t
ube
re
actio
n Lysis
Targeted reverse transcription
Multiplex preamplification
of both DNA and cDNA targets
B C
D
Fig. S3. CD45– clusters express EMT markers, do not share common mutations with the
primary tumor, and have normal chromosomal structures. (A) Representative four-color
immunofluorescence staining for CD45, vimentin (VIM), pan-cytokeratin (CK), and DAPI,
indicating heterogeneous mesenchymal and epithelial marker expression by 2 clusters from two
WG
Acon
trol
No
rma
l tissue
DN
A
WG
Acon
tro
l
Tum
or
tissu
e D
NA
Chromosomes (1-22)
pa
nC
KV
IMC
D45
Sin
gle
Clu
ste
r
Sin
gle
Clu
ste
r
Cl.5
Cl.1
Cl.2
Cl.3
Cl.4
Cl.5
Sin
gle
Clu
ste
r Cl.6
Cl.8
Cl.9
D Patient 7
Patient 13
Patient 15
* **
Tu
mo
r
tissue
Tum
or
tissue
*** *
Tu
mo
r
tissu
e
* * * * *** ** * *
Norm
al
tissu
e
No
rmal
tissue
Norm
al
tissu
e
Cluster#1 Cluster#2 B
02000400060008000
10000
02000400060008000
10000
Ch
an
nel 1
am
plit
ud
e
Channel 2 amplitude
0
1000
2000
3000
4000
5000
6000
P10-Cl.5
Tumor tissue
Normal tissue
Blank
Target Sample Replicate #1 Replicate #2
P08 tumor 0 0
P08 normal 0 0
P08-Cl.12 4.1 3.3
P25 normal 0 0
P86 normal 0 0
P87 normal 0 0
Blank 0 0
P10 tumor 0 0
P10 normal 0 0
P10-Cl.5 14.5 15.9
P25 normal 0 0
P86 normal 0 0
P87 normal 0 0
Blank 0 0
P13 tumor 0 0
P13 normal 0 0
P13-Cl.5 43.2 40.9
P25 normal 0.08 0
P86 normal 0 0
P87 normal 0 0.13
Blank 0 0
Fractional abundance (%)
KRAS
c.G196A
AKT
c.G27A
c.G97A
NRAS
02000400060008000
10000
02000400060008000
10000
C
Chromosomes (1-22)
Chromosomes (1-22)
Chromosomes (1-22)
E
A
different patients. (B) Representative 2-D plot of droplet digital PCR confirming the absence of
matching mutations between clusters and tumor tissues. Green dots indicate droplets containing
PCR products from wild-type alleles; red and blue dots indicate droplets containing PCR
products with mutant allele amplification; gray dots are no amplification. (C) Fractional
abundance of each replicate and tissue sample or clusters assayed by ddPCR on 5 ng DNA
template. Red values indicate positive signals; values <0.2% were below the assay maximum
sensitivity. (D) Control experiment to assess the impact of whole genome amplification (WGA)
for aCGH experiments. (E) Array comparative genomic hybridization (aCGH) of single clusters
and matched tissues for three patients, in addition to patient 10 in Fig. 2E.
Fig. S4. Cytomorphology of CD45– clusters used for RNA-seq. Phase contrast and Hoechst
33342 images of clusters used for RNA-seq for each patient. No images are available for patient
P1 because the cells were directly deposited in RNA-seq buffer without imaging. Scale bars, 10
µm.
Cl.5
Cl.4
Cl.2
Cl.6 Cl.11
Cl.2
Cl.10
Cl.14
Cl.15
Cl.16
Cl.7
Cl.2
Pa
tie
nt
18
Pa
tie
nt
10
Pa
tien
t 1
6P
atien
t 2
1
Pa
tie
nt 8
Pa
tie
nt
20
Pa
tien
t 1
9
Cl.10
Cl.11
Fig. S5. Cell-type inference workflow from RNA-seq data.
Fig. S6. Validation of the cell-type inference algorithm. (A) Selected genes with the highest
specificity index for representative cell types were verified for specificity using BioGPS. (B)
Gene expression levels for markers commonly used in CTC research to define epithelial cells.
Note KRT18 expression in the endothelial lineages, and EPCAM expression in embryonic stem
L9
4_
Oo
cyte
L9
5_
Sp
erm
ato
cyte
L1
_E
SC
L2
_E
SC
L3
_E
SC
L4
_E
SC
L5
_E
SC
L1
4_
iPS
CL
21
_M
eso
de
rma
l_S
tem
.Ce
llL
7_
Ne
uro
ep
ithe
lial
L1
0_
Ne
ura
l_P
recu
rso
rL
11
_N
eu
ral_
Pre
cu
rso
rL
8_
Astro
cyte
L9
_A
stro
cyte
L5
5_
MS
C_
Fe
tal.B
loo
dL
56
_M
SC
_F
eta
l.Bo
ne.M
arro
wL
57
_M
SC
_F
eta
l.Liv
er
L2
6_
Fib
robla
st_
Fo
reskin
L2
7_
Fib
robla
st_
Bre
ast.T
issu
eL
43
_S
mo
oth
.mu
scle
.ce
ll_B
ron
ch
ial
L3
6_
Ch
on
dro
cyte
L3
8_
Oste
obla
st
L3
9_
Ad
ipo
cyte
L3
0_
En
do
the
lial_
Hu
mb
ilica
l.Ve
inL
31
_E
nd
oth
elia
l_Lym
ph
.Ve
sse
lL
33
_E
nd
oth
elia
l_H
um
bilic
al.V
ein
L3
4_
En
do
the
lial_
Lym
ph
atic
L2
2_
Ep
ithe
lial_
Bla
dd
er
L2
3_
Ep
ithe
lial_
Bla
dd
er
L2
4_
Ep
ithe
lial_
Bro
nch
ial
L2
5_
Ep
ithe
lial_
Bro
nch
ial
L4
0_
He
pa
tocyte
L4
4_
HS
C_
Bo
ne.M
arro
wL
45
_C
MP
_B
on
e.M
arro
wL
46
_G
MP
_B
on
e.M
arro
wL
47
_M
EP
_B
on
e.M
arro
wL
50
_C
FU
_B
one.M
arro
wL
54
_E
ryth
robla
st_
Pro
L4
8_
Mye
locyte
_B
on
e.M
arro
wL
49
_M
ye
locyte
_B
on
e.M
arro
wL
52
_P
late
let
L5
8_
Mo
no
cyte
L5
9_
Mo
no
cyte
_C
D1
4L
61
_M
on
ocyte
_C
D1
6L
62
_M
on
ocyte
L6
3_
Ma
cro
ph
age
_M
on
ocyte
.de
rive
dL
64
_M
acro
ph
age
_M
on
ocyte
.de
rive
dL
65
_M
acro
ph
age
_M
on
ocyte
.de
rive
dL
66
_M
acro
ph
age
_M
on
ocyte
.de
rive
dL
67
_M
acro
ph
age
_A
lve
ola
rL
68
_D
en
dritic
.Ce
ll_M
on
ocyte
.de
rive
dL
69
_D
en
dritic
.Ce
ll_M
on
ocyte
.de
rive
dL
71
_D
en
dritic
.Ce
ll_P
lasm
acyto
idL
72
_D
en
dritic
.Ce
ll_P
lasm
acyto
idL
87
_N
eu
trop
hil
L8
8_
Ne
utro
ph
ilL
73
_T.C
ell_
CD
4.N
aiv
eL
74
_T.C
ell_
CD
4.C
en
tr..Me
mo
ryL
75
_T.C
ell_
CD
4.C
en
tr..Me
mo
ryL
76
_T.C
ell_
CD
4.E
ff..Me
mo
ryL
77
_T.C
ell_
CD
8.C
en
tr..Me
mo
ryL
78
_T.C
ell_
CD
8L
79
_T.C
ell_
CD
4L
80
_T.C
ell_
CD
8.E
ff..Me
mo
ryL
81
_T.C
ell_
CD
8.N
aiv
eL
82
_T.C
ell_
CD
8L
51
_B
.Ce
ll_B
one.M
arro
wL
89
_B
.Ce
llL
90
_B
.Ce
ll_N
aiv
eL
91
_B
.Ce
llL
92
_B
.Ce
ll_M
em
ory
Oocyte (GSM1160112)Spermatozoa (GSM1383918)HSCs (GSM1306651)Mesodermal stem cellsNeural stem cells (GSM953382)Astrocytes (GSM1901321)Mesenchymal stem cells (GSM921002)Fibroblasts (GSM1668455)Smooth muscle cells (GSM1275863)Osteblasts (GSM1333401)
Adipocytes (GSM1468495)Endothelial cells (GSM1009635)Epithelial cells (GSM1857416)
GMP (GSM1341669)HSC (GSM1341670)MEP (GSM1341671)Promyelocytic cell HL-60 (GSM1663005)PlateletsMacrophages (GSM1586130)pDC(GSM2090431)Neutrophils(GSM996197)T lymphocytes (GSM1631665)B lymphocytes (GSM1260318)
Em
bry
on
ic s
tem
cell
Me
sen
chym
al
ste
m c
ell
iPS
CM
eso
de
rmal
ste
m c
ell
Ne
uro
na
lp
recu
rsor
Astr
ocyte
Cho
ndro
cyte
Oste
obla
st
Ad
ipocyte
Fib
robla
st
Sm
oo
th m
uscle
cell
End
oth
elia
l
Ep
ith
elia
l
He
pa
tocyte
Pro
-Ery
thro
bla
st
Pla
tele
t
Neu
tro
phil
T lym
ph
ocyte
B lym
pho
cyte
ME
P
Mo
no
cyte
Macro
pha
ge
mD
C
Oocyte
Spe
rma
tocyte H
SC
CM
PG
MP
Mye
lob
last
CF
U
pD
C
Normalized odds ratio
Min Max
Cell typesC
D
MPO
(Hematopoietic)
CCL18
(Monocyte-derived)
CD3D
(T cell)
MS4A1
(B cell)
PLAC1L
(Oocyte)
Prim
ary
cells
Pri
ma
ry c
ells
Gene expression (RMA)
Cell types
Embryonic stem cell
iPSC
Epithelial (mesoderm-derived)
Endothelial
Hepatocyte (endoderm-derived epithelial)
Epithelial (ectoderm-derived)
Hematopoietic pluripotent
Macrophages
Dendritic cells
T cells
B cells
Oocyte
LEFTY
(ESC)
KRT13
(Epithelial)
CDH5
(Endothelial)
GC
(Hepatocyte)
KRTDAP
(Keratinocyte)
Prim
ary
cells
KRT18 EPCAM
Gene expression (RMA)
Gene
(Cell type)
Gene
(Cell type)
Gene
A B
Chondrocytes (GSM1695254)
Hepatocytes (GSM54066)
Positiv
e c
on
trols
cells and hematopoietic cells. Data are represented as horizontal barplots derived from the output
of the primary cell atlas dataset of BioGPS (C) Heat map comparing the number of genes
enriched for each published positive control sample (rows) and cell type (columns) over random
enrichment. Each colored box represents a normalized odds ratio of the respective Fisher’s exact
test ranging from 0 (black) to 1 (red). mDC, myeloid dendritic cell; pDC, plasmacytoid dendritic
cell. (D) IDs for each cell types reported in table S8. Order of cell types corresponds to the
heatmap columns shown in (C).
Fig. S7. Comparison between endothelial clusters (this study) and CTC clusters in Aceto et
al. (9). Selected breast cancer cell lines with epithelial and mesenchymal lineage profiles and
primary endothelial cells were used as positive controls for epithelial, mesenchymal stem cells
and endothelial lineages. Lineage inference of CTC clusters reported in (9) shows the presence
of epithelial-derived cell clusters. Platelet and red blood cell signals were removed from this
analysis to highlight cell types belonging to the above-mentioned lineages of interest.
Fig. S8. Circulating endothelial clusters are tumor-derived. (A) Principal component analysis
reveals an association of endothelial clusters with tumor tissues. Left panel, principal component
1 (PC1) vs PC2. PC1 correlates with the standard deviation of inputed variables for single
clusters (r = -0.455) and not does distinguish tumor and normal tissues (batch effect). Right
panel, PC2 vs PC3 indicating association of endothelial clusters with the tumor tissues but not
with the normal tissues. Colored areas represent k-means clustering. (B) Cell type inference of
whole normal and tumor tissues and matched laser-captured endothelial cells, showing
enrichment of the target cell population. (C) Unsupervised hierarchical clustering of endothelial
cell populations show significant association of circulating endothelial clusters with intestinal
endothelial cells. nECs, colon-derived normal endothelial cells; tECs, CRC-derived tumor
endothelial cells. (D) Genes differentially expressed between nECs and tECs. PNOI probability of
differential expression as computed by NOISeq. Log2FC, log2(fold change). (E) Heatmap
showing the expression of published tumor endothelial markers (23, 25, 42) in single circulating
endothelial clusters, medians of normal tissue, tumor tissue and matched normal and tumor laser-
capture dissected endothelial cells. Rows of log2(FPKM+1) values were normalized by scaling
between 0 and 1. Values are shown along with color-coding of the scale.
Fig. S9. tEC clusters in a mouse model of CRC. (A) Experimental approach. (B) GFP
genotyping in CT26 xenografts at passage 0 and passage 1. Mouse 2 was excluded from
subsequent analysis. (C) Gating strategy for the analysis of tumor-derived (GFP+) endothelial
cell clusters. Biex, Biexponential scale. (D) Immunofluorescence staining of tumor vasculature
using CD309+ antibodies at passage 0 and passage 1. Scale bar, 50 µm. (E) Boxplot represent
quantification of CD309+GFP+ single and clustered cells in the blood of P1 xenografts and
controls. Data are individual animals.
ACT26 s.c.
500,000
14 d 1) Blood sampling
2) Flow cytometry
14 d
Transplantation s.c.
5x Balb/c2x L2G85 Balb/c (GFP +)
B
FSC
SS
C
Hoechst 33342 - Width
Ho
ech
st 3
33
42
- A
rea
GFP - Alexa 488 (Biex)C
D3
09
- A
PC
(B
iex)
GFP - Alexa 488 (Biex)
CD
30
9 -
AP
C (
Bie
x)
Single Cells Clusters
Single
Cells
Clusters
GFP+
CD309+GFP+
CD309+
C
E Single cells
Passage 1Passage 0
#1 #2
Passage 1
xenograft tumor
wt
173 bp (GFP)324 bp (control)
a
b
c
L2G85
Passage 0
#2
#4
#5
D
Passa
ge 1
Pa
ssa
ge
0
* *
a b c
#1
* *
#3
* * *
*
**
G
FP
+ /
CD
30
9 +
Sin
gle
cells
(%
of sin
gle
ce
lls)
Con
trol (
n = 3
)
Xenog
raft
(n =
4)
0.0
0.5
1.0
1.5
2.0
0.0
0.5
1.0
1.5
2.0
Con
trol (
n = 3
)
Xenog
raft
(n =
4)
G
FP
+ /
CD
30
9 +
Clu
ste
rs (
% c
luste
rs)
Clusters
0.0
0.5
1.0
1.5
2.0
0.0
0.5
1.0
1.5
2.0 P=0.14
CD309 / Hoechst
*
Fig. S10. tEC clusters do not correlate with selected CRC tumor and patient characteristics
or comorbidities. (A) Association of circulating endothelial cell cluster counts with selected
tumor characteristics (tumor location or size, immune cells, blood markers). A/T/D, ascending,
transverse and descending colon; R/S, rectum and sigmoid colon. (B and C) Association of
circulating endothelial cell cluster counts with general population characteristics (age, BMI, sex)
(B) and with comorbitidies (hypertension, diabetes, heart disease) (C). Data refer to patients, for
which all available clinical data were used in the analysis. Correlations were tested using the
Kendall’s Tau method. Unpaired samples were tested using a two-tailed Wilcoxon-Mann-
Whitney U test.
SUPPLEMENTARY TABLES
Tables S1-S14 are provided in one Supplementary Excel spreadsheet, with captions below.
Table S1. scrmPCR primers used in this study.
Table S2: Whole genome application (WGA) false-positives are detected only at VAF <
10%. WGA and unamplified DNA derived from the same sample (Patient 15) were sequenced to
detect false positive mutations arising from the amplification procedure.
Table S3. Circulating CD45– clusters do not mirror matching primary tumor mutations. Targeted high-throughput sequencing of single CD45– clusters. Numbers in colored boxes
indicate variant allele frequency (VAF) (coverage). Cl, cluster; ND, not detected; NA, not
available.
Table S4. Sporadic mutations in circulating CD45– clusters are not detected in matching
primary tumor tissues. Blue rows indicate variants tested by digital PCR and reported in fig S3.
Numbers in colored boxes indicate variant allele frequency (VAF) and coverage. ND, not
detected.
Table S5. Germline variants in tumor tissues and circulating CD45– cell clusters (coverage
and zygosity). NA, not available; ND, not detected.
Table S6. RNA-seq data: number of uniquely mapped reads to hg19 exons.
Table S7. RNA-seq data: processed data (FPKM) of all samples described in the study.
Table S8. Definition of the cell types used for the inference algorithm.
Table S9. Top 80 genes ordered from highest to lowest specificity index (S) for each cell
type selected for the inference algorithm.
Table S10. Positive published RNA-seq control samples used to validate the inference
algorithm.
Table S11. Characteristics of tEC clusters isolated in this study. Staining intensity: –,
negative; +, low; ++, average; +++, strong.
Table S12. Circulating endothelial cell cluster count before and after surgery. Data are from
Fig. 4B. Pre: Clusters count in blood 0-24 hrs before surgery. Post: Clusters count in blood 24-72
hrs after surgery.
Table S13. Baseline patients’ and healthy donors’ characteristics.
Table S14. CD45– cluster count for each baseline sample type and number of single clusters
analyzed for each technique in the corresponding samples. Sources: FORTIS, Fortis Surgical
Hospital, Singapore; NCC, National Cancer Center, Singapore; IBN, Institute of Bioengineering
and Nanotechnology, Singapore; SPHS, Singapore Population Health Studies. CRC, colorectal
cancer.