12
ORIGINAL ARTICLE Identification of biomarkers for tuberculosis disease using a novel dual-color RT–MLPA assay SA Joosten 1 , JJ Goeman 2 , JS Sutherland 3 , L Opmeer 1 , KG de Boer 1 , M Jacobsen 4,5 , SHE Kaufmann 4 , L Finos 2 , C Magis-Escurra 1,6 , MOC Ota 3 , THM Ottenhoff 1 and MC Haks 1 1 Department of Infectious Diseases, Leiden University Medical Center, Leiden, The Netherlands; 2 Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, The Netherlands; 3 Bacterial Diseases Programme, Medical Research Council Laboratories, Banjul, The Gambia and 4 Department of Immunology, Max Planck Institute for Infection Biology, Berlin, Germany Owing to our lack of understanding of the factors that constitute protective immunity during natural infection with Mycobacterium tuberculosis (Mtb), there is an urgent need to identify host biomarkers that predict long-term outcome of infection in the absence of therapy. Moreover, the identification of host biomarkers that predict (in)adequate response to tuberculosis (TB) treatment would similarly be a major step forward. To identify/monitor multi-component host biomarker signatures at the transcriptomic level in large human cohort studies, we have developed and validated a dual-color reverse- transcriptase multiplex ligation-dependent probe amplification (dcRT–MLPA) method, permitting rapid and accurate expression profiling of as many as 60–80 transcripts in a single reaction. dcRT–MLPA is sensitive, highly reproducible, high-throughput, has an extensive dynamic range and is as quantitative as QPCR. We have used dcRT–MLPA to characterize the human immune response to Mtb in several cohort studies in two genetically and geographically diverse populations. A biomarker signature was identified that is strongly associated with active TB disease, and was profoundly distinct from that associated with treated TB disease, latent infection or uninfected controls, demonstrating the discriminating power of our biomarker signature. Identified biomarkers included apoptosis-related genes and T-cell/B-cell markers, suggesting important contributions of adaptive immunity to TB pathogenesis. Genes and Immunity (2012) 13, 71–82; doi:10.1038/gene.2011.64; published online 29 September 2011 Keywords: dcRT–MLPA; host biomarkers; tuberculosis Introduction Biomarkers are defined as ‘characteristics that are objectively measured and evaluated as an indicator of normal biological processes, pathogenic processes or pharmacological responses to a therapeutic intervention’. 1 Biomarkers, or rather, ‘surrogate endpoints’, offer the means to classify disease status, disease activity, disease prognosis and disease progression, as well as the effects of interventions (drugs, surgery, vaccines and so on). Biomarkers can be analyzed in any tissue or body fluid, but peripheral blood is the most commonly used source in clinical practice. Biomarkers can be determined at different levels, for example, at the cellular-, protein- or transcript level, but informational complexity increases from transcriptome to proteome to cellulome. Quantita- tive changes in RNA expression levels are currently being analyzed using either genome-wide (microarray) or single-gene (real-time PCR) screening methods. As biomarker signatures encompass multiple gene tran- scripts, neither method is ideally suited for monitoring expression of a restricted set of genes involved in defined biological processes. Furthermore, microarray analysis and real-time PCR are technically challenging and too costly to be applied on a routine basis in resource-poor settings. Moreover, gene expression profiling using microarray analysis is characterized by complex data analysis, whereas the rather limited dynamic range (2–3 logs) 2 compromises the ability to accurately quantify RNA expression levels. Here we describe a novel technique that is especially designed to combine a set of markers at the transcrip- tomic level. This technique, dual-color reverse-transcrip- tase multiplex ligation-dependent probe amplification (dcRT–MLPA) is inexpensive, fast, robust, and permits rapid and accurate RNA expression profiling of as many as 80 transcripts in a single reaction. Genes of interest can be selected on a tailor-made basis, and a PCR amplifica- tion step within dcRT–MLPA ensures assay sensitivity, which is an essential prerequisite for the relative quantification of scarcely expressed genes. As this assay is high-throughput (96-well format) and requires low amounts of RNA (100 ng), it is an exceptionally suitable technique to determine biomarker signatures in larger cohort studies. Received 25 May 2011; revised 1 August 2011; accepted 12 August 2011; published online 29 September 2011 Correspondence: Dr MC Haks, Department of Infectious Diseases, Leiden University Medical Center, Group of Immunology and Immunogenetics of Mycobacterial Infectious Diseases, Albinusdreef 2, 2333 ZA Leiden, The Netherlands. E-mail: [email protected] 5 Current address: Department of Immunology, Bernhard-Nocht- Institute for Tropical Medicine, Hamburg, Germany. 6 Current address: Department of Pulmonary Diseases, Nijmegen University Medical Center Dekkerswald, Groesbeek, The Netherlands. Genes and Immunity (2012) 13, 71–82 & 2012 Macmillan Publishers Limited All rights reserved 1466-4879/12 www.nature.com/gene

Identification of biomarkers for tuberculosis disease using a novel dual-color RT–MLPA assay

Embed Size (px)

Citation preview

ORIGINAL ARTICLE

Identification of biomarkers for tuberculosis disease usinga novel dual-color RT–MLPA assay

SA Joosten1, JJ Goeman2, JS Sutherland3, L Opmeer1, KG de Boer1, M Jacobsen4,5, SHE Kaufmann4,L Finos2, C Magis-Escurra1,6, MOC Ota3, THM Ottenhoff1 and MC Haks1

1Department of Infectious Diseases, Leiden University Medical Center, Leiden, The Netherlands; 2Department of Medical Statistics andBioinformatics, Leiden University Medical Center, Leiden, The Netherlands; 3Bacterial Diseases Programme, Medical Research CouncilLaboratories, Banjul, The Gambia and 4Department of Immunology, Max Planck Institute for Infection Biology, Berlin, Germany

Owing to our lack of understanding of the factors that constitute protective immunity during natural infection withMycobacterium tuberculosis (Mtb), there is an urgent need to identify host biomarkers that predict long-term outcome ofinfection in the absence of therapy. Moreover, the identification of host biomarkers that predict (in)adequate response totuberculosis (TB) treatment would similarly be a major step forward. To identify/monitor multi-component host biomarkersignatures at the transcriptomic level in large human cohort studies, we have developed and validated a dual-color reverse-transcriptase multiplex ligation-dependent probe amplification (dcRT–MLPA) method, permitting rapid and accurate expressionprofiling of as many as 60–80 transcripts in a single reaction. dcRT–MLPA is sensitive, highly reproducible, high-throughput,has an extensive dynamic range and is as quantitative as QPCR. We have used dcRT–MLPA to characterize the humanimmune response to Mtb in several cohort studies in two genetically and geographically diverse populations. A biomarkersignature was identified that is strongly associated with active TB disease, and was profoundly distinct from that associatedwith treated TB disease, latent infection or uninfected controls, demonstrating the discriminating power of our biomarkersignature. Identified biomarkers included apoptosis-related genes and T-cell/B-cell markers, suggesting important contributionsof adaptive immunity to TB pathogenesis.Genes and Immunity (2012) 13, 71–82; doi:10.1038/gene.2011.64; published online 29 September 2011

Keywords: dcRT–MLPA; host biomarkers; tuberculosis

Introduction

Biomarkers are defined as ‘characteristics that areobjectively measured and evaluated as an indicator ofnormal biological processes, pathogenic processes orpharmacological responses to a therapeutic intervention’.1

Biomarkers, or rather, ‘surrogate endpoints’, offer themeans to classify disease status, disease activity, diseaseprognosis and disease progression, as well as the effectsof interventions (drugs, surgery, vaccines and so on).Biomarkers can be analyzed in any tissue or body fluid,but peripheral blood is the most commonly used sourcein clinical practice. Biomarkers can be determined atdifferent levels, for example, at the cellular-, protein- ortranscript level, but informational complexity increasesfrom transcriptome to proteome to cellulome. Quantita-tive changes in RNA expression levels are currently

being analyzed using either genome-wide (microarray)or single-gene (real-time PCR) screening methods. Asbiomarker signatures encompass multiple gene tran-scripts, neither method is ideally suited for monitoringexpression of a restricted set of genes involved in definedbiological processes. Furthermore, microarray analysisand real-time PCR are technically challenging and toocostly to be applied on a routine basis in resource-poorsettings. Moreover, gene expression profiling usingmicroarray analysis is characterized by complex dataanalysis, whereas the rather limited dynamic range (2–3logs)2 compromises the ability to accurately quantifyRNA expression levels.

Here we describe a novel technique that is especiallydesigned to combine a set of markers at the transcrip-tomic level. This technique, dual-color reverse-transcrip-tase multiplex ligation-dependent probe amplification(dcRT–MLPA) is inexpensive, fast, robust, and permitsrapid and accurate RNA expression profiling of as manyas 80 transcripts in a single reaction. Genes of interest canbe selected on a tailor-made basis, and a PCR amplifica-tion step within dcRT–MLPA ensures assay sensitivity,which is an essential prerequisite for the relativequantification of scarcely expressed genes. As this assayis high-throughput (96-well format) and requires lowamounts of RNA (100 ng), it is an exceptionally suitabletechnique to determine biomarker signatures in largercohort studies.

Received 25 May 2011; revised 1 August 2011; accepted 12 August2011; published online 29 September 2011

Correspondence: Dr MC Haks, Department of Infectious Diseases,Leiden University Medical Center, Group of Immunology andImmunogenetics of Mycobacterial Infectious Diseases, Albinusdreef 2,2333 ZA Leiden, The Netherlands.E-mail: [email protected] address: Department of Immunology, Bernhard-Nocht-Institute for Tropical Medicine, Hamburg, Germany.6Current address: Department of Pulmonary Diseases, NijmegenUniversity Medical Center Dekkerswald, Groesbeek, The Netherlands.

Genes and Immunity (2012) 13, 71–82& 2012 Macmillan Publishers Limited All rights reserved 1466-4879/12

www.nature.com/gene

A particularly useful application of dcRT–MLPA is theidentification and monitoring of host–biomarker profilesto investigate the human immune response to infectionon a population scale. In this report we investigated thepotential of dcRT–MLPA to identify and monitor host–biomarker profiles in tuberculosis (TB). It is estimatedthat one-third of the global population is infected withMycobacterium tuberculosis (Mtb).3,4 However, immediateprogression to active TB is rare and most frequently, theinfection is initially contained by the host immunesystem resulting in latent TB infection (LTBI). LTBI canlater (re)activate resulting in TB disease, which occurs inan estimated 3–10% of cases, 80% of whom within2 years after infection.5 A major roadblock in thedevelopment of effective new TB vaccines (both pre-ventive and post exposure) is our lack of understandingof the factors that constitute protective immunity duringnatural infection with Mtb or induced by vaccination.5

Currently, the success of a vaccine is measured by theprevention of active TB outbreak in study groupparticipants in large (phase III efficacy) trials. Thus, thediagnosis of incipient TB disease serves as the clinicalendpoint in necessarily large and long-lasting TB vaccinetrials. Considering the relatively low disease incidenceand the potentially long latency period, clinical efficacytrials would hugely benefit from the identification ofbiomarkers that could serve as early surrogate endpointsthat predict the outcome at an early stage and can replaceclinical endpoints. Thus, there is an urgent need toidentify host biomarkers of protective host immunity,which will allow the early identification of those LTBIs atrisk of progressing to active TB and in need of preventivetreatment after exposure, versus those whose immunesystem will effectively contain the infection. Beside theseimportant applications of biomarkers predicting progres-sion to TB disease and/or vaccine efficacy, the identifica-tion of biomarkers that predict (in)adequate response toTB treatment would similarly be a major step forward, asit would allow more effective treatment and thus reducethe occurrence of multidrug-resistant/extensive drug-resistant Mtb strains.

Here we characterized and evaluated the potential ofdcRT–MLPA, and used this novel technique to evaluategene expression profiles in several cohort studies in twogenetically and geographically diverse populations, fromthe Gambia and Paraguay. A biomarker signature isidentified that is associated with active TB disease, and isprofoundly distinct from that associated with treated TBdisease, latent infection or uninfected healthy controls,demonstrating the discriminating power of our biomarkersignature by dcRT–MLPA technology.

Results

Principle of the dcRT–MLPA techniqueThe MLPA assay is based on the ligation of two half-probes hybridized adjacently to a target sequence,followed by a quantitative PCR amplification of theligated products that are size-separated using capillaryelectrophoresis, and was originally described to deter-mine the copy number of DNA sequences.6 Morerecently, in order to enable sensitive detection of RNAtranscripts, a RT step with gene-specific primers beforethe probe annealing stage was introduced.7 However, a

major disadvantage of both the assays is the labor-intensive and costly preparation of the M13-derivedhalf-probes. To bypass this serious drawback, wereplaced the production of M13 vector-based half-probesby chemically synthesized oligonucleotides. Further-more, instead of using ‘spacer’ sequences, the discrimi-native length of each amplicon was assured by varyingthe length of the target-specific sequence in each probeset. The inherent restriction of this approach is theoligonucleotide synthesis length and as a consequencethe number of probes that can be combined within asingle assay. To circumvent these limitations and tomaximize the number of target genes that can beanalyzed within a single RT–MLPA assay, we designeda dcRT–MLPA assay, combining two sets of half-probes,each amplified by primers labeled with a differentfluorophore.8 The principle of this technique is outlinedin Supplementary Figure S1.

Validation of the dcRT–MLPA techniqueTo reliably monitor changes in gene transcription, it isessential that a novel technique such as dcRT–MLPA hasan extensive dynamic range, is sensitive and has a lowassay cutoff. To validate the dcRT–MLPA assay for theseparameters and to be able to tightly control target genecopy numbers, cDNA was replaced as a hybridizationtemplate by a mixture of chemically synthesized oligo-nucleotides. To determine the dynamic range andsensitivity of the assay, serial dilutions of the synthetictemplate oligonucleotides were prepared and subjectedto dcRT–MLPA analysis using a validation probe set. Thegene targets included in the validation probe setencompassed genes known to be induced (IFNG, TNFand IL2) or downregulated (CD8A, RAB33 and CD14)upon mitogenic stimulation of peripheral blood mono-nuclear cells (PBMCs) and genes whose transcriptionalactivity was not affected by mitogenic stimulation ofPBMCs, including reference genes (GUSB and GAPDH).As shown in Figure 1a, dcRT–MLPA was found to havean extensive dynamic range of 4–7 logs, with an averageof 5 logs and a detection threshold as low as 20oligonucleotide copies of any given gene. These observa-tions indicate that when comparing a single-geneexpression assay such as real-time PCR with a multiplexgene expression assay such as dcRT–MLPA, only minorcompromises in sensitivity and dynamic range have tobe accepted. To establish the dcRT–MLPA assay cutoff,the average percentage s.d. of decreasing log2 trans-formed peak area intervals was calculated (Figure 1b).Because assigned peaks, with a peak area p7.64, couldonly be called by the GeneMapper software (AppliedBiosystems, Warrington, UK), if the peaks were perfectlyshaped, the percentage s.d. profoundly increased below7.64. Therefore, the cutoff value of this assay wasestablished to correspond with the threshold value fornoise cutoff in GeneMapper software. In conclusion, thepotential of this assay to sensitively and accuratelyquantify gene expression levels over a broad range ofsynthetic template copy numbers highlights its prospec-tive use as a new tool for biomarker identification andmonitoring.

Comparison between dcRT–MLPA and Taqman real-time PCRAs both dcRT–MLPA and real-time PCR heavily dependon amplification of target products by PCR, it was

Biomarker identification using a novel dcRT–MLPA assaySA Joosten et al

72

Genes and Immunity

anticipated that these two methods display similar assaycharacteristics. To directly compare gene expressionprofiles using dcRT–MLPA and real-time PCR, PBMCsfrom five Dutch healthy donors were stimulated for 6 hwith phytohemagglutinin or anti-CD3/CD28 beads(Figure 2a) or phytohemagglutinin and subsequentlymixed with unstimulated PBMCs at the indicated ratios(Figure 2b) before RNA isolation was performed fol-lowed by dcRT–MLPA (black bars) and real-time PCR(white bars) analysis on the same samples using thevalidation probe set. Data illustrated in Figure 2 clearlyshow that the results obtained with both RNA expres-sion-profiling techniques are highly comparable. More-over, both weak and strong variations in gene expressionlevels are faithfully detected by dcRT–MLPA, demon-strating the accurateness and robustness of this methodto quantify specific changes in RNA expression levels.

Reproducibility of the dcRT–MLPA assayTo evaluate the reproducibility of the dcRT–MLPA assay,whole blood was collected from 12 Dutch healthy donorsat two different time points (B2 months apart). At boththe time points, whole blood of each donor was collectedin five separate PAXgene tubes (BD Biosciences, Breda,The Netherlands). Gene expression profiles of all the

collected samples were analyzed in two independentdcRT–MLPA assays using the complete probe set(Supplementary Table S1). To evaluate the inter-assayvariation of dcRT–MLPA, the relative gene expressionprofiles of several independent duplicate samples werecompared. Interassay correlation between representativedata sets of independent samples was excellent (0.998)for both scarcely and abundantly expressed genes(Figure 3a), highlighting the reliability of this techniqueto monitor the (subtle) changes in gene expressionprofiles. Subsequently, variance component estimationwas calculated on the obtained data set. Initially, the totalobserved variation in gene expression profiles, regard-less of its size, was set at 100% and the proportionalcontribution of each of the four components (donor,collection, tube and assay) to this total variation wasdetermined. Figure 3b displays examples of geneexpression profiles that were profoundly affected byvariations in one of these components. Clearly, directex vivo RNA expression levels of CD4 were highlycomparable between donors and also appeared steadyover time. In sharp contrast, CD8A gene expressionlevels differed considerably between donors (but wereconstant over time), whereas B2M expression levelschanged significantly over time, excluding CD8A as a

Intervals of peak areas (log2)

Ave

rag

e %

std

ev

0

20

40

60

80

100

< 6.

64

6.64

-7.6

4

7.64

-8.2

3

8.23

-8.6

4

8.64

-8.9

7

8.97

-9.2

3

9.23

-9.4

5

9.45

-9.6

4

9.64

-9.8

1

8.81

-9.9

7

9.97

-10.

97

10.9

7-11

.55

11.5

5-11

.97

11.9

7-12

.29

> 12

.29

Cut-off

Gene

CD14CD8AGAPDHGUSBIFNGIL2RAB33ATNF

456755457.5

8.5

9.5

10.5

11.5

12.5

13.5

14.5

15.5

Template input (pmol)

5x10

-2

5x10

-3

5x10

-4

5x10

-5

5x10

-6

5x10

-7

5x10

-8

5x10

-95x

10-1

05x

10-1

1

20 moleculesRel

ativ

e g

ene

exp

ress

ion

leve

ls (

log

2)

Dynamicrange (log10)

Figure 1 Dual-color RT–MLPA assay validation. To determine the dynamic range and sensitivity of dcRT–MLPA, cDNA was replaced as ahybridization template by a mixture of chemically synthesized oligonucleotides that were complementary to the RNA sequence andencompassed the combined target-specific sequences of the left- and right-hand half-probes. Serial dilutions of chemically synthesizedtemplate oligonucleotides were analyzed in four independent experiments by dcRT–MLPA using the validation probe set. (a) Shown is themedian of the relative expression (log2 transformed peak areas) of GAPDH (solid symbols) and CD8A (open symbols) ±s.d. of triplicatereactions plotted against input concentrations of the synthetic template oligonucleotides (left panel) and the detected dynamic range of thedifferent target genes present in the validation probe set (right panel). Calculation of the detection threshold was corrected for the facts thatonly a small proportion of the ligation products are PCR amplified and only a fraction of the PCR products are analyzed on a capillarysequencer. The dotted line represents assay cutoff. (b) The cutoff of the dcRT–MLPA assay was determined by calculating the averagepercentage standard deviation of decreasing log2 transformed peak area intervals.

Biomarker identification using a novel dcRT–MLPA assaySA Joosten et al

73

Genes and Immunity

potential useful biomarker in Dutch adults of Europeandescent and excluding B2M as a suitable reference genein European cohort studies. The relative variationintroduced by each component can have an insignificantas well as a major impact on the overall variation,depending on the range of the overall variation. There-fore, we recalculated the variance component estimationand displayed the data as fold-change that each genecontributed to the variation of each component(Figure 3c). The large majority of the genes within thecomplete probe set contributed only to a minor extent tothe variation introduced by each of the four components.Furthermore, as expected, the predominant factor intro-

ducing variation to the data was the donor component,whereas the components collection, tube and assayintroduced comparable but significantly less variationto the data, underscoring the reproducibility of dcRT–MLPA.

Identification and monitoring of biomarker signaturesTo investigate the potential of dcRT–MLPA in identifyinghost–biomarker profiles in direct ex vivo whole-bloodsamples in a clinically relevant setting, we started out tocharacterize the human immune response to infectionwith Mtb, with particular emphasis on the expression ofgenes associated with TB disease. Gene expression

dcRT-MLPATaqman real-time PCR

-

PH

A

CD

3/28

-

PH

A

CD

3/28

-

PH

A

CD

3/28

0.1

1

10

100

1000

10000IFNG IL2

Fo

ld c

han

ge

-

PH

A

CD

3/28

-

PH

A

CD

3/28

-

PH

A

CD

3/28

0.0

0.5

1.0

1.5

2.0CD8A

Fo

ld c

han

ge

0.1

1

10

100

1000

0:1

1:1

1:0.

5

1:0.

25

1:0.

125

1:0.

063

1:0.

032

1:0.

016

1:0.

008

1:0.

004

1:0.

002

PBMC dilutions (Unstimulated:PHA)

Fo

ld c

han

ge

IL2

0.1

1

10

0:1

1:1

1:0.

5

1:0.

25

1:0.

125

1:0.

063

1:0.

032

1:0.

016

1:0.

008

1:0.

004

1:0.

002

PBMC dilutions (Unstimulated:PHA)

Fo

ld c

han

ge

RAB33A

0.1

1

10

0:1

1:1

1:0.

5

1:0.

25

1:0.

125

1:0.

063

1:0.

032

1:0.

016

1:0.

008

1:0.

004

1:0.

002

PBMC dilutions (Unstimulated:PHA)

Fo

ld c

han

ge

TNF

0:1

1:1

1:0.

5

1:0.

25

1:0.

125

1:0.

063

1:0.

032

1:0.

016

1:0.

008

1:0.

004

1:0.

002

PBMC dilutions (Unstimulated:PHA)

Fo

ld c

han

ge

0.1

10

100

1

IFNG

dcRT-MLPATaqman real-time PCR

RAB33ACD14

TNF

Figure 2 Comparison between dcRT–MLPA and Taqman real-time PCR. PBMCs from healthy Dutch donors were stimulated for 6 h with(a) phytohemagglutinin and anti-CD3/CD28 beads or (b) phytohemagglutinin and subsequently mixed with unstimulated PBMCs atindicated ratios before RNA isolation was performed followed by dcRT–MLPA (black bars) and Taqman real-time PCR (white bars) analyseson the same samples. Expression profiles were determined of those genes present within the validation probe set. Mean gene expressionlevels were normalized to GAPDH and fold induction was calculated relative to the unstimulated control samples±s.d. of triplicate reactions.Data shown correspond to one representative experiment out of four performed and one representative healthy donor out of five healthydonors analyzed.

Biomarker identification using a novel dcRT–MLPA assaySA Joosten et al

74

Genes and Immunity

profiles of 79 TB patients at recruitment (TB cases), 83Mtb-infected healthy controls (LTBIs) and 74 uninfectedhealthy controls from The Gambia were analyzed bydcRT–MLPA using the complete probe set (Supplemen-tary Table S1). The complete probe set contained genesthat have been associated with active TB disease orprotection against disease in literature (IL4/IL4d2, CCL22,

CD163, TGFBR2),9–12 genes identified by microarray thatwere differentially expressed between PBMCs fromactive TB patients and healthy household contacts(FPR1, BPI, MARCO, SEC14L1, RAB24, RAB13, RAB33A,FCGR1A, LTF),13 genes identified by microarray thatwere differentially expressed between tissue located nearand distant from tuberculomas (CCR7, CCL13, IL22RA1,

Relative expression experiment 1 (log2)

Rel

ativ

e ex

pre

ssio

n e

xper

imen

t 2

(lo

g2)

7

9

11

13

15

17

19

7

R2=0.998R2=0.998

CD4

Donor1 2 3 4 5 6 7 8 9 101112

10

11

12

13

14

Rel

ativ

e g

ene

exp

ress

ion

leve

ls (

log

2)

Month 0Month 2

CD8A

Donor1 2 3 4 5 6 7 8 9 101112

B2M

Donor1 2 3 4 5 6 7 8 9 101112

Donor : 27%Collection : 22%Tube : 30%dcRT-MLPA : 21%

Donor : 59%Collection : 2%Tube : 13%dcRT-MLPA : 26%

Donor : 27%Collection : 47%Tube : 15%dcRT-MLPA : 11%

10

11

12

13

14

13

14

15

16

0

5

10

15

20

25

30

35

40

45

1-1.1

Fold change

Co

un

ts (

nu

mb

er o

f g

enes

) Donor

Collection

Tube

dcRT-MLPA

19171513119

1.9-2.01.8-1.91.7-1.81.6-1.71.5-1.61.4-1.51.3-1.41.2-1.31.1-1.2

Figure 3 Reproducibility of the dcRT–MLPA assay. Two independent dcRT–MLPA assays were performed on whole-blood RNA extractedfrom five PAXgene tubes collected from 12 Dutch healthy donors at two different time points. Variance component estimation was performedto calculate the contribution of the following four components to the variation in the gene expression profiles: (1) donor variation—thevariation within a cohort of healthy individuals, (2) collection variation—the variation within the same healthy individual over time, (3) tubevariation—the variation between the five separate PAXgene tubes collected from the same healthy donor, and (4) assay variation—thevariation between temporally different RT–MLPA assays. (a) Plotted are the relative expression levels of genes (log2 transformed) in tworepresentative independent pairs of samples showing highly concordant data for both weak and strong signals. (b) Direct ex vivo RNAexpression levels (bars represent median peak areas normalized to GAPDH and log2 transformed±s.d. of five separate PAXgene tubes) of theindicated genes are displayed for all donors at both time points (top panel). Variance component estimation, calculating the proportionalcontribution of each of the four components to the overall variation (set at 100%), is shown below each corresponding bar diagram.(c) Calculation of the variance component estimation taking into account the range of the overall variation. Shown is the fold-change that eachgene contributes to the variation of each component.

Biomarker identification using a novel dcRT–MLPA assaySA Joosten et al

75

Genes and Immunity

SPP1, BLR1, CCL19, MMP9, TIMP2), apoptosis-relatedgenes differentially regulated by Mtb (TNFRSF1A,TNFRSF1B, BCL2, CASP8, TNF, FASLG),14 genes thatidentify different lymphocyte subsets (CD3E, CD4,CD8A, CD14, CD19, NCAM1), regulatory T-cell-asso-ciated markers (FOXP3, IL7R, TGFB1, CTLA4, LAG3,

IL10, CCL4, TNFRSF18, IL2RA), effector T-cell markers(IFNG, CXCL10) and reference genes (GAPDH, ABR,GUSB, B2M). Analysis of variance testing for globaldifferences in gene expression profiles indicated signifi-cant differences between the study groups(P¼ 5.4� 10�28), whereas pairwise testing of the obtained

TB casesLTBIsUninfected controls

CD3E IL7R CD8A

8

10

12

14

BLR1

8

9

10

11

12

CD19

8.08.59.09.5

10.010.5

FCGR1A

7.5

9.5

11.5

13.5

MMP9

7.58.08.59.09.5

10.010.511.011.5

CD163

8

9

10

11

12

Rel

ativ

e g

ene

exp

ress

ion

leve

ls (

log

2)

12

13

14

15

11

12

13

14

15******

******

******

******

******

******

******

*

TB cases versus LTBIs

Gene Regressioncoefficients(Intercept)

LTFFOXP3FCGR1ACD14IL4δ2 -0.09IL2RBPICD4CD19IL10BLR1CD8AIL7R

TB cases versus uninfected controls

Gene Regressioncoefficients(Intercept)

IL2RCD14FCGR1ATGFB1CTLA4CCR7FOXP3CD8ACXCL10CD4CD19NCAM1BLR1IL7RCD3E

LTBIs versus uninfected controls

Gene Regressioncoefficients(Intercept)

TNFIL2RFOXP3

1-Specificity

TB cases vs LTBIs TB cases vs uninfected controls LTBIs vs healthy controls

AUC = 90.8% AUC = 86.0% AUC = 53.1%

Sen

siti

vity

TB cases vs LTBIs TB cases vs uninfected controls LTBI vs uninfected controls

Pre

dic

ted

pro

bab

ility

TB casesLTBIsUninfected controls

0.0

0.0

-46.61

-0.31-0.18-0.13-0.09

-0.070.100.210.280.450.810.951.91

-63.36

-0.18-0.10-0.090.010.030.040.140.230.300.370.370.400.811.301.56

0.90

-0.15-0.130.16

1.0

0.8

0.6

0.4

0.2

0.0

1.0

0.8

0.6

0.4

0.2

0.0

1.0

0.8

0.6

0.4

0.2

0.0

1.0

0.8

0.6

0.4

0.2

0.0

1.0

0.8

0.6

0.4

0.2

1.00.80.60.40.2 0.0 1.00.80.60.40.2 0.0 1.00.80.60.40.2

Biomarker identification using a novel dcRT–MLPA assaySA Joosten et al

76

Genes and Immunity

data set revealed profound differences in the overallgene expression profiles between TB cases versus LTBIsand TB cases versus uninfected healthy controls(Pp10�16 and Pp10�14, respectively), whereas thedifference between LTBIs and uninfected healthy con-trols was statistically significant but much less pro-nounced (P¼ 6.7� 10�5). Subsequently, individualbiomarkers that were differentially expressed betweenthe study groups were identified (Supplementary FigureS2, Supplementary Table S2a) and examples of geneexpression profiles displaying strong association with TBdisease (CD3E, IL7R, CD8A, BLR1, CD19, FCGR1A andMMP9) or as a comparison, no association at all (CD163)are shown in Figure 4a.

To determine biomarker signatures with the highestdiscriminatory power, the data set was analyzed usingthe Lasso regression model, which is a shrinkage andselection method for linear regression.15 It minimizes theusual sum of squared errors, with a bound on the sum ofthe absolute values of the coefficients. The biomarkersignatures that classified TB cases versus LTBIs, TB casesversus uninfected healthy controls and LTBIs versusuninfected healthy controls encompassed 13, 15 and 3biomarkers, respectively (Figure 4b). The classifyingcapability of these biomarker signatures is displayedeither as a receiver operating characteristic curve wherethe true positive rate (sensitivity) is plotted against thefalse positive rate (1�specificity or 1�true negative rate,Figure 4c) or as a box-and-whisker plot, indicating thepredicted probability using the identified biomarkersignatures as classifiers (Figure 4d). Clearly, althoughthe biomarkers that are differentially expressed betweenLTBIs and uninfected healthy controls have limitedclassifying value (area under the curve of 53.1%) whencombined into a biomarker signature, the biomarkersignatures that are used to classify TB cases versus LTBIsor TB cases versus uninfected healthy controls haveexcellent classifying values (area under the curve of90.8% and 86.0%, respectively). Adding more biomarkersto the biomarker signatures did not further improve theclassifying capability of these signatures as a non-selective Ridge regression (including all genes compris-ing the complete dcRT–MLPA set) displayed area underthe curve values for TB cases versus LTBIs or TB casesversus uninfected healthy controls of 91.8% and 84.5%,respectively, (data not shown).

As these data indicate that the identified biomarkersignature to classify TB cases versus LTBIs is composedof biomarkers that are strongly associated with TBdisease, we next evaluated the ability of this biomarkersignature to classify TB patients in the Gambia duringtreatment. As shown in Figure 5a and SupplementaryTable S2b, the discriminative power of this biomarkersignature is already apparent in TB patients treated for 2

months only, whereas at 4 months after therapy,biomarker profiles of treated TB patients have becomeindistinguishable from those of LTBIs. Examples of geneexpression profiles (CD3E, IL7R, CD8A, BLR1, CD19,FCGR1A and MMP9) during the treatment of TB patientsare shown in Figure 5b.

Comparison of biomarkers associated with TB disease betweenan African and South-American populationTo investigate whether the biomarkers associated withTB disease are specific for the Gambian (or African)population, gene expression profiles were determined ondirect ex vivo whole-blood RNA samples in a smallParaguayan (South American) cohort consisting of TBcases, TB patients treated for 4 weeks and healthcareworkers (latently infected, long-term exposed to TB).Despite the limited cohort size, a comparison of geneexpression patterns between TB cases and healthcareworkers already confirmed 16 of the 25 genes (forexample, CD3E, IL7R, BLR1, CD19, FCGR1A, CXCL10,CD4, TNF, BCL2, CASP8 and CCL4) that were differen-tially expressed in TB cases in the Gambia (Figure 6),suggesting that the discriminatory power of the majorityof the identified biomarkers is not restricted to theGambia or even to Africa.

In conclusion, dcRT–MLPA is a sensitive, robust andreproducible assay for biomarker detection and biomar-ker signature profiling. The first data demonstrate thepower of dcRT–MLPA to identify biomarker signaturesin direct ex vivo whole-blood samples with sufficientdiscriminating power when using small cohorts.

Discussion

Using the novel dcRT–MLPA assay, we identified multi-ple genes that were differentially expressed with greatstatistical significance between TB cases, LTBIs anduninfected healthy controls (Supplementary Table S2).From the 25 genes (out of 45 tested) that weredifferentially regulated between TB cases and LTBIs, 20genes also displayed distinct expression patterns be-tween TB cases and uninfected healthy controls. Incontrast, differences in direct ex vivo RNA expressionlevels between LTBIs and uninfected controls were lesspronounced and only eight genes were found to bedifferentially expressed between these two study groups,including three regulatory T-cell markers (FOXP3, IL2RAand TGFB1). In concordance with these findings, Lassoregression analysis easily identified biomarker signa-tures with excellent classifying value between TB casesversus LTBIs and TB cases versus uninfected controls(area under the curves of 90.8% and 86.0%, respectively),but was unable to identify a biomarker signature

Figure 4 Identification of biomarker signatures. Dual-color RT–MLPA was performed on direct ex vivo RNA isolated from PAXgene tubesderived from TB cases, LTBIs and uninfected healthy controls from the Gambia. (a) Median gene expression levels (peak areas normalized toGAPDH and log2 transformed) of the indicated genes are shown as box-and-whisker plots (5–95 percentiles; dotted line represents assaycutoff). Significant differences between study groups were determined using Kruskal–Wallis H and Dunn’s multiple comparison tests.*0.01oPo0.05, **0.001oPo0.01 and ***Po0.001. (b) Composition of biomarker signatures identified by the Lasso test to classify TB casesversus LTBIs, TB cases versus uninfected healthy controls and LTBIs versus uninfected healthy controls. Numbers represent coefficientsof logistic regression (log-odds ratios). (c) Receiver operator characteristics curves or (d) box-and-whisker plots (5–95 percentiles) showingthe accuracy of identified biomarker signatures to discriminate between TB cases, LTBIs and uninfected healthy controls. AUC, area underthe curve.

Biomarker identification using a novel dcRT–MLPA assaySA Joosten et al

77

Genes and Immunity

with sufficient discriminative power between LTBIs anduninfected controls. The classifying value of the identi-fied biomarker signatures may be further enhanced infuture studies by selecting other markers for dcRT–MLPA. Interesting candidate markers to include may bederived from recent microarray studies, highlighting apotential role for type I interferon-ab signaling path-ways.16 Certainly, improving the classifying value of thebiomarker signature that classifies the infection status inhealthy controls may not sufficiently benefit from thisapproach as the differences in gene expression levelsbetween LTBIs and uninfected controls may be too subtleto detect using ex vivo whole blood. To circumvent thisissue, antigen-specific stimulation could be applied toenhance differences and thereby allow such discrimina-tion. Another advantage of incorporating antigen-speci-fic stimulation is that LTBIs are a very heterogeneousgroup that may have been exposed recently to otherpathogens endemic to the Gambia, and therefore directex vivo profiles may not necessarily directly relate to TB

disease outcome. Moreover, antigen-specific stimulationstudies combined with direct ex vivo RNA profiling ofpatients (co)infected with other pathogens will help todetermine the specificity of the identified biomarkersignature for TB disease.

Genes encompassing the complete dcRT–MLPA setwere derived primarily from literature and previousmicroarray studies.9–14,17,18 Several biomarkers identifiedearlier to discriminate between TB cases and LTBIs inGerman and South African cohorts (for example,FCGR1A (CD64), LTF and RAB33A),13,19 were confirmedin the current study cohort from the Gambia. Intrigu-ingly, FCGR1A is the only biomarker from our signaturethat was also identified by a recent large-scale micro-array analysis in UK and South-African TB patients,16

discriminating active TB disease from LTBI, despitebeing not specific for TB when compared with otherinflammatory diseases. Identification of FCGR1A in allthese studies emphasizes the potential significance ofFCGR1A as a TB biomarker and warrants further

Pre

dic

ted

pro

bab

ility

TB casesTB cases treated - 2 monthsTB cases treated - 4 monthsTB cases treated - 6 monthsUninfected controls

0.0

CD3E IL7R CD8A BLR1

CD19 FCGR1A MMP9

Rel

ativ

e g

ene

exp

ress

ion

leve

ls (

log

2)

TB cases TB cases treated - 2 months TB cases treated - 4 months

TB cases treated - 6 months Uninfected controls

8

9

10

11

12

13

14

CD163

12

13

14

15***

******

***

11

12

13

14

15***

******

***

7

8

9

10

11 ****

******

7.5

9.5

11.5

13.5 ******

******

89

101112131415

******

******

7

8

9

10

11

12

13 ******

******

7

8

9

10

11

12

13***

******

***

1.0

0.8

0.6

0.4

0.2

Figure 5 Monitoring of biomarker signatures. Dual-color RT–MLPA was performed on direct ex vivo RNA isolated from PAXgene tubesderived from TB cases, LTBIs and TB patients treated for 2, 4 or 6 months from the Gambia. (a) Box-and-whisker plots (5–95 percentiles)showing the accuracy of the biomarker signature that discriminates between TB cases and LTBIs, to classify treated TB patients. (b) Mediangene expression levels (peak areas normalized to GAPDH and log2 transformed) of the indicated genes are shown as box-and-whisker plots(5–95 percentiles; dotted line represents assay cutoff). Significant differences between study groups were determined using Kruskal–Wallis Hand Dunn’s multiple comparison tests. *0.01oPo0.05, **0.001oPo0.01 and ***Po0.001.

Biomarker identification using a novel dcRT–MLPA assaySA Joosten et al

78

Genes and Immunity

analysis of the role of FCGR1A and antibodies in TBpathogenesis. The suggestion that humoral immunitycould have a more important role in TB pathogenesisthan is currently anticipated,20,21 is also supported by theidentification of multiple B-cell-related genes that aredifferentially expressed between TB patients and LTBIs(in both our Gambian and Paraguayan cohort), includingbesides FCGR1A, CD19 and BLR1 (CXCR5).

In addition to a strong correlation between B-cell-related genes and TB disease, a clear association betweenT-cell lineage markers and TB disease could also beobserved. Expression levels of CD3E, CD4, CD8A, IL7Rand FOXP3 were significantly distinct between TB casesand LTBIs, and with the exception of FOXP3, genes weremore abundantly expressed in LTBIs compared with TBcases. The lower FOXP3 gene expression levels inrecently infected individuals that were previouslydescribed by Burl et al.22 and now confirmed in thepresent study, may represent a shifted balance betweeneffector and regulatory T-cell populations in early stagesafter infection or mirror increased regulatory T-cellmigration to the tissue, or local lymph nodes, in earlyphases of disease,23 as has also been suggested duringacute viral infections.24 The decreased expression ofT-cell subset markers CD3E, CD4 and CD8A during acuteTB disease has neither been reported before, nor havesignificant decreases in T-cell numbers. However,although most prominent in our cohort from the Gambia,also TB patients from Paraguay displayed significantlyreduced expression levels for CD3E and CD4. The valueof these markers as novel TB biomarkers will have to beconfirmed in future large-scale cohorts.

Recently, a set of apoptosis-related genes was identi-fied in an Ethiopian cohort whose expression wasassociated with TB disease.14 Indeed, four out of sixgenes included in the complete dcRT–MLPA set(TNRFSF1A, CASP8, BCL2 and TNF) were confirmed to

be associated with TB disease in our Gambian cohort,supporting their value as TB biomarkers. Although thenature of the biomarkers identified in this study is quitediverse, they all appear promising and contribute to thebiomarker signature discriminating between TB casesand LTBIs.

During treatment of TB cases, gene expression profileschanged over time and at the end of treatment(6 months) were similar to gene expression profilesobserved in LTBIs. This indicates that TB treatment doesaffect gene expression profiles in peripheral blood, buttime is needed to achieve levels observed in healthyinfected contacts. Interestingly, the changes in expressionlevels of different genes during treatment showeddistinct kinetics. Several genes returned to the expressionlevels observed in LTBIs within 2 months (for example,CD3E and BLR1), whereas others reached control levelsonly after the full 6 months of treatment (for example,FCGR1A). The time needed for particular markers tonormalize to levels detected in LTBIs may indicateresponse to treatment, but more detailed analyses arerequired to support this hypothesis.

As demonstrated by the identification of powerfulbiomarker signatures discussed above, dcRT–MLPAproved to be a valuable tool for monitoring geneexpression profiles in large cohorts. The assay wasadapted from an assay described by Eldering et al.7 intoa dual-color assay8 to increase the number of transcriptsto be analyzed, and using synthetic oligonucleotides togreatly accelerate the probe manufacturing process. Wevalidated the assay extensively and showed that theassay is sensitive, reproducible and robust, with resultscomparable to Taqman real-time PCR. Variation primar-ily resulted from donor to donor variation and was notattributable to inherent assay variation. dcRT–MLPA ispositioned in between microarray and real-time PCR, asit will give quantitative expression data of multiple

CD3E IL7R CD8A BLR1

CD19 FCGR1A MMP9

Rel

ativ

e g

ene

exp

ress

ion

leve

ls (

log

2)

TB cases TB cases treated Health care workers

CD163

12

13

14

15

16

11

12

13

14

15

16

10

11

12

13

14

8

9

10

11

12

13

7

8

9

10

11

789

10111213

9

10

11

12

13

9.5

10.0

10.5

11.0

11.5***

*****

**

*** *****

****

Figure 6 Identification of biomarkers associated with TB disease in a small Paraguayan cohort. dcRT–MLPA was performed on direct ex vivoRNA isolated from PAXgene tubes derived from TB patients, TB patients treated for 4 weeks and latently infected healthcare workers.Median gene expression levels (peak areas normalized to GAPDH and log2 transformed) of the indicated genes are shown as box-and-whisker plots (5–95 percentiles; dotted line represents assay cutoff). Significant differences between study groups were determined usingKruskal–Wallis H and Dunn’s multiple comparison tests. *0.01oPo0.05, **0.001oPo0.01 and ***Po0.001.

Biomarker identification using a novel dcRT–MLPA assaySA Joosten et al

79

Genes and Immunity

genes. dcRT–MLPA is cheap compared with microarrayand real-time PCR, requires less handling time and onlylow amounts of RNA, and can be performed in a high-throughput (96-well) format. Importantly, technologytransfer to resource-poor settings is possible, makingdcRT–MLPA an interesting method for identification andmonitoring of biomarkers in large cohorts in TB endemicregions.

Application of dcRT–MLPA is not limited to vaccina-tion studies or infectious diseases, but any (disease) stageassociated with specific changes in marker expressioncan be assessed using dcRT–MLPA. During TB infection,dcRT–MLPA may be useful in follow-up analysis ofLTBIs to identify those LTBIs at risk of progressing toactive TB and in need of preventive treatment afterexposure and in vaccination studies using novel candi-date vaccines potentially to shorten follow-up timesrequired to demonstrate efficacy. Moreover, techniquesmay be adapted to measure biomarker profiles in otherbiological fluids, for example, sputum. The longevity ofbiomarkers after infection or vaccination is unknown andwill have to be determined to estimate the predictivevalue of these biomarkers, especially in case–contactsettings and in long-term follow-up vaccination studies.However, similar profiles may be re-expressed in PBMCsafter short-term recall antigen triggering, especially forgenes expressed by memory T cells, and is currentlyexplored.

In conclusion, we have demonstrated that dcRT–MLPA is highly suitable for biomarker evaluation inlarge cohorts of contacts or vaccine, as the technique isrobust, reproducible, sensitive and quantitative. Costsper gene and the 96-well format of the assay make itextremely suitable for screening large groups of indivi-duals even in less developed settings. Initial analysisusing a 45-gene dcRT–MLPA set allowed excellentdiscrimination between TB cases versus LTBIs and TBcases versus uninfected controls, illustrating the powerof biomarker monitoring using dcRT–MLPA. Biomarkersthat allowed discrimination of infection and diseaseincluded CD3E, CD4, CD8, BLR1, FCGR1A and CD19,suggesting important contributions of adaptive immu-nity to TB pathogenesis.

Materials and methods

Ethics statementThe research was approved by the Institutional ReviewBoards of the Leiden University Medical Center (theNetherlands), MRC (The Gambia) and the Ministry ofPublic Health and Social Welfare (Paraguay). Informedconsent was obtained from all participants, and theclinical investigation was conducted according to theprinciples expressed in the ‘Declaration of Helsinki’.

Donors and reagentsPBMCs purified from anonymous buffy coats werecollected from healthy blood bank donors (Dutch, adults)that all had signed informed consent. PBMCs (2� 106 per24-well) were stimulated for 6 h with phytohemaggluti-nin (2 mg ml�1, Remel, Oxoid, Haarlem, the Netherlands)or anti-CD3/CD28 beads (Dynal Biotech, Hamburg,Germany) in Iscove’s modified Dulbecco’s medium(Invitrogen, Breda, the Netherlands) containing 10%

pooled human serum. Subsequently, cells were pelleted(5 min, 1200 r.p.m.), resuspended in 1 ml Trizol reagent(Invitrogen) and total RNA was purified according to themanufacturer’s protocol.

Whole-blood samples were collected in PAXgene tubes(BD Biosciences) from a Dutch cohort (12 healthysubjects), a Gambian cohort (79 TB patients at recruit-ment (TB cases), 83 Mtb-infected healthy controls (LTBI),74 uninfected healthy controls, 23 TB patients treated for2 months, 12 TB patients treated for 4 months and 29 TBpatients treated for 6 months), and a Paraguayan cohort(8 TB patients at recruitment, 12 TB patients treated for 4weeks and 21 latently infected healthcare workers).

PAXgene whole-blood RNA isolationTotal RNA from venipuncture PAXgene blood collectiontubes (stored at �80 1C, Supplementary Figure S3) wasextracted and purified using the PAXgene Blood RNA kit(BD Biosciences) including on-column DNase digestionaccording to the manufacturer’s protocol. The RNA yieldfrom 2.5 ml of whole blood was determined by aNanoDrop ND-1000 spectrophotometer (NanoDropTechnologies, Wilmington, DE, USA) and ranged from4.2 to 8.5mg of total RNA (average 6.02±1.5 mg) with anaverage OD260/280 ratio of 2.0±0.04. To assess the qualityand integrity of the RNA, samples were run on anAgilent 2100 BioAnalyzer (Agilent Technologies, Am-stelveen, The Netherlands) using the RNA 6000 NanoChip kit. The average RNA integrity number of the totalRNA samples obtained from Paxgene tubes was9.5±0.08.

dcRT–MLPA assay, data analysis and quality controlHalf-probes consisted of chemically synthesized oligo-nucleotides (Sigma-Aldrich Chemie B.V., Zwijndrecht,the Netherlands) and right-hand half-probes were50 phosphorylated to facilitate ligation. For each target-specific sequence, a specific RT primer was designed thatis complementary to the RNA sequence and locatedimmediately downstream of the probe target sequence.As a positive control, chemically synthesized oligonu-cleotides were used that were complementary to theRNA sequence and encompassed the combined target-specific sequences of the left and right hand half-probes.To avoid detection of contaminating DNA fragments, alltarget sequences have an exon boundary near the probeligation site. Moreover, splice variants and single-nucleotide polymorphisms present in the mRNA weretaken into account. All half-probes and RT primers usedin this study are described in detail in SupplementaryTable S1.

Development of the dcRT–MLPA required multipleadaptations to the protocol described by Eldering et al.7

RNA samples (2.5 ml of a 50 ng ml�1 solution) were mixedwith 1� MMLV reverse transcriptase buffer, dNTPs(0.4 mM of each nucleotide) and 80 nM of the target-specific RT primers in a final volume of 4.5ml. Afterheating for 1 min to 80 1C and incubation for 5 min at45 1C, 30 U (1.5 ml) of MMLV reverse transcriptase(Promega, Leiden, the Netherlands) was added andincubated for 15 min at 37 1C before heat inactivation ofthe enzyme for 2 min at 98 1C. Subsequently, 1.5ml ofhalf-probe mix (4 nM) and 1.5ml of SALSA MLPA buffer(MRC-Holland, Amsterdam, the Netherlands) wereadded to the reaction, heat denatured for 1 min at 95 1C

Biomarker identification using a novel dcRT–MLPA assaySA Joosten et al

80

Genes and Immunity

followed by hybridization for 16 h at 60 1C. Ligation ofthe annealed half-probes was performed for 15 min at54 1C by adding 3ml ligase-65 buffer A, 3ml ligase-65buffer B, 25 ml H2O and 1ml of ligase-65 (MRC-Holland).Heat inactivation of the ligase enzyme was subsequentlyperformed for 5 min at 98 1C. Ligation products wereamplified in a 20-ml PCR reaction containing 5ml ligationproduct, 2ml SALSA PCR buffer, 1ml SALSA enzymedilution buffer, 1ml SALSA FAM-labeled MLPA primers,1ml HEX-labeled MAPH primers (2 mM each, forwardprimer 50-GGCCGCGGGAATTCGATT-30, reverse primer50-GCCGCGAATTCAC TAGTG-30), 14.75ml H2O and0.25ml SALSA polymerase (MRC-Holland). Thermalcycling conditions encompassed are as follows: 33 cyclesof 30 s at 95 1C, 30 s at 58 1C and 60 s at 72 1C, followed by1 cycle of 20 min at 72 1C. PCR amplification productswere 1:10 diluted in HiDi formamide-containing 400HDROX size standard and analyzed on an AppliedBiosystems 3730 capillary sequencer in GeneScan mode(Applied Biosystems).

Trace data were analyzed using the GeneMappersoftware package (Applied Biosystems). The areas ofeach assigned peak (in arbitrary units) were exported forfurther analysis in Microsoft Excel spreadsheet software.Signals below the threshold value for noise cutoff inGeneMapper (log2 transformed peak area p7.64) wereassigned the threshold value for noise cutoff. Resultsfrom target genes were calculated relative to the averagesignal of one of four reference genes present within thegene set (ABR, GUSB, GAPDH or B2M). Subsequently,the percentage s.d. was calculated to determine whichreference gene was most stably expressed across theevaluated samples. Following normalization of the data,signals below the threshold value for noise cutoff (peakarea p7.64) were again assigned the threshold value fornoise cutoff.

To monitor assay performance, a negative control(without RNA), a positive control (using synthetictemplate oligonucleotides as hybridization templates)and a commercial Human Universal Reference RNA(Clontech, Palo Alto, CA, USA) were included on each96-well plate. To determine which data points should beexcluded from the analysis, the total signal intensity ofall four reference genes (GAPDH, B2M, ABR and GUSB;HEX- and FAM-labeled genes were analyzed separately)was calculated for each sample followed by calculatingthe average±3 s.d. of these genes over the total 96-wellplate using the following formula: (intensity (B2MþABRþGUSBþGAPDH) sample) o (((intensity (B2MþABRþGUSBþGAPDH) sample 1�x)/� samples)±3 s.d.’s of (intensity (B2MþABRþGUSBþGAPDH)sample 1�x))).

Statistical analysisTo evaluate the reproducibility of the dcRT–MLPA assay,an orthogonal experimental design was created. Thestatistical model used is a random effects model withnested factors as described by Searle et al.25 In this model,the ‘donor’ represents the highest factor (that is, sourceof variability within a population), ‘collection’ (that is,variability between time points within the same subject)is modeled as a nested factor within the donor, ‘tube’(that is, variability between samples of a subject at acertain time point) is nested within the collection and‘dcRT–MLPA’ (i.e., variability between assays/assay

reproducibility) is the last nested factor. This latter factoris also representative of the reliability of the dcRT–MLPAmeasurement. The nested model was subsequently fittedon the trace data collected for each target gene. Beforeperforming the statistical analysis, trace data wereanalyzed and processed as described above (dcRT–MLPA data analysis) followed by log2 transformationof the data values.

To test for differences between expression profiles ofTB cases, LTBIs and uninfected healthy controls, we usedthe Global Test (PMID 14693814), testing for pairwisedifferences between the groups. Next, for each of thethree pairwise comparisons, the genes were clustered ina hierarchical clustering, based on average linkage andabsolute correlation distance. All single genes, as well asall sets of genes implied in the clustering, were tested fordifferential expression profiles using the Global Test.Correction for multiple testing was performed separatelyfor each of the three clustering graphs using the methodof Meinshausen.26,27 These tests and multiple-testingprocedures were performed using the package GlobalTest (version 5.3.3) in R (version 2.10.1; SupplementaryFigure S2). To test for significant differences of singlegenes between study groups, Kruskal–Wallis Hand Dunn’s multiple comparison tests were applied(Figures 4–6).

For the Lasso regression analysis, data were randomlysplit into a training set of 157 subjects and anindependent test set of 79 subjects. On the training set,a separate Lasso logistic regression model was fitted foreach pairwise comparison between the three groups ofTB cases, LTBIs and uninfected healthy controls. Thetuning parameter l that determines the amount ofshrinkage in the Lasso model was chosen in each of thefitted models by optimizing the cross-validated loglikelihood in the training set. Predicted values for thetest set were found by applying the fitted model to thetest data, and these predicted values (test set only) wereused to generate receiver operating characteristic curves.For comparison, a logistic ridge regression model wasfitted on the same training and test set, and using thesame criterion for selecting the tuning parameter l. Allthe calculations for Lasso and ridge regression wereperformed using the package penalized (PMID:19937997, version 0.9–32) in R (version 2.10.1). The Rscript used to perform the Lasso regression analysis isdepicted in Supplementary Figure S4.

Taqman real-time PCRFirst-strand cDNA was synthesized using 400 ng of RNAand oligo(dT) primers in a 10-ml reverse transcriptasereaction mixture using SuperScript III First-StrandSynthesis System (Invitrogen, Carlsbad, CA, USA)according to the manufacturer’s protocol. Quantitativereal-time PCR was performed on a 7900HT Fast Real-Time PCR System (PE Applied Biosystems, Norwalk, CT,USA) and was conducted in a total volume of 25 mlcontaining 900 nM forward and reverse primer, 250 nM

FAM dye-labeled Taqman MGB probe and 10 ng ofcDNA in a 1� TaqMan Universal PCR Master Mix, NoAmpErase UNG, including a passive reference (ROX)fluorochrome. Optimized thermal cycling conditionsencompassed are as follows: 1 cycle of 10 min at 95 1C,followed by 50 cycles of 15 s at 95 1C and 1 min at 60 1C.Relative gene expression was calculated using the DCt

Biomarker identification using a novel dcRT–MLPA assaySA Joosten et al

81

Genes and Immunity

method and normalized to the expression of GAPDH.All assays were performed in triplicate and the data arepresented as the mean±s.d. The inventoried TaqManGene Expression Assays were purchased from AppliedBiosystems and included: IFNG (Hs00989291_m1),TNF (Hs00174128_m1), IL2 (Hs00174114_m1), CD8A(Hs00233520_m1), RAB33A (Hs00191243_m1), CD14(Hs00169122_g1), GUSB (Hs00939627_m1) and GAPDH(Hs03929097_g1).

Conflict of interest

The authors declare no conflict of interest.

Acknowledgements

We thank Dr SJ White for his technical support indesigning a dual-color RT–MLPA, Dr EMS Leyten andDr M van Westreenen for their contribution to theParaguay cohort design and sample collection, and Dr AGeluk and Dr T van Hall for critically reviewing themanuscript. We gratefully acknowledge all the fundingthat made the work possible. We especially acknowledgethe Bill and Melinda Gates Foundation (Grand Chal-lenges in Global Health GC6#74), 6th frameworkprogramme TBVAC contract no. LSHP-CT-2003-503367,7th framework programme NEWTBVAC contract no.HEALTH-F3-2009-241745 (the text represents theauthors’ views and does not necessarily represent aposition of the Commission that will not be liable for theuse made of such information), The NetherlandsOrganization for Scientific Research (VENI grant916.86.115) and the Gisela Thier foundation of the LeidenUniversity Medical Center. The funders had no role instudy design, data collection and analysis, decision topublish or in preparation of the manuscript.

References

1 Atkinson AJ, Colburn WA, DeGruttola VG, DeMets DL,Downing GJ, Hoth DF et al. Biomarkers and surrogateendpoints: preferred definitions and conceptual framework.Clin Pharmacol Ther 2001; 69: 89–95.

2 Bowtell DD. Options available—from start to finish—forobtaining expression data by microarray. Nat Genet 1999; 21:25–32.

3 Lonnroth K, Castro KG, Chakaya JM, Chauhan LS, Floyd K,Glaziou P et al. Tuberculosis control and elimination 2010–50:cure, care, and social development. Lancet 2010; 375: 1814–1829.

4 World Health Organization. Global tuberculosis controlepidemiology, strategy, financing. WHO Report, WHO, Geneva 2009.

5 Kaufmann SH. Future vaccination strategies against tubercu-losis: thinking outside the box. Immunity 2010; 33: 567–577.

6 Schouten JP, McElgunn CJ, Waaijer R, Zwijnenburg D,Diepvens F, Pals G. Relative quantification of 40 nucleic acidsequences by multiplex ligation-dependent probe amplifica-tion. Nucleic Acids Res 2002; 30: e57.

7 Eldering E, Spek CA, Aberson HL, Grummels A, Derks IA, deVos AF et al. Expression profiling via novel multiplex assayallows rapid assessment of gene regulation in definedsignalling pathways. Nucleic Acids Res 2003; 31: e153.

8 White SJ, Vink GR, Kriek M, Wuyts W, Schouten J, Bakker Bet al. Two-color multiplex ligation-dependent probe amplifica-tion: detecting genomic rearrangements in hereditary multipleexostoses. Hum Mutat 2004; 24: 86–92.

9 Bonecini-Almeida MG, Ho JL, Boechat N, Huard RC,Chitale S, Doo H et al. Down-modulation of lung immuneresponses by interleukin-10 and transforming growth factorbeta (TGF-beta) and analysis of TGF-beta receptors I and II inactive tuberculosis. Infect Immun 2004; 72: 2628–2634.

10 Demissie A, Abebe M, Aseffa A, Rook G, Fletcher H, Zumla Aet al. Healthy individuals that control a latent infection withMycobacterium tuberculosis express high levels of Th1cytokines and the IL-4 antagonist IL-4delta2. J Immunol 2004;172: 6938–6943.

11 Knudsen TB, Gustafson P, Kronborg G, Kristiansen TB,Moestrup SK, Nielsen JO et al. Predictive value of solublehaemoglobin scavenger receptor CD163 serum levels forsurvival in verified tuberculosis patients. Clin Microbiol Infect2005; 11: 730–735.

12 Lienhardt C, Azzurri A, Amedei A, Fielding K, Sillah J, SowOY et al. Active tuberculosis in Africa is associated withreduced Th1 and increased Th2 activity in vivo. Eur J Immunol2002; 32: 1605–1613.

13 Jacobsen M, Repsilber D, Gutschmidt A, Neher A,Feldmann K, Mollenkopf HJ et al. Candidate biomarkers fordiscrimination between infection and disease caused byMycobacterium tuberculosis. J Mol Med 2007; 85: 613–621.

14 Abebe M, Doherty TM, Wassie L, Aseffa A, Bobosha K,Demissie A et al. Expression of apoptosis-related genes in anEthiopian cohort study correlates with tuberculosis clinicalstatus. Eur J Immunol 2010; 40: 291–301.

15 Tibshirani R. Regression shrinkage and selection via the lasso.J Royal Statist Soc B 1996; 58: 267–288.

16 Berry MP, Graham CM, McNab FW, Xu Z, Bloch SA, Oni Tet al. An interferon-inducible neutrophil-driven blood tran-scriptional signature in human tuberculosis. Nature 2010; 466:973–977.

17 Jenner RG, Young RA. Insights into host responses againstpathogens from transcriptional profiling. Nat Rev Microbiol2005; 3: 281–294.

18 Mistry R, Cliff JM, Clayton CL, Beyers N, Mohamed YS,Wilson PA et al. Gene-expression patterns in whole bloodidentify subjects at risk for recurrent tuberculosis. J Infect Dis2007; 195: 357–365.

19 Maertzdorf J, Repsilber D, Parida SK, Stanley K, Roberts T,Black G et al. Human gene expression profiles of susceptibilityand resistance in tuberculosis. Genes Immun 2011; 12: 15–22.

20 Abebe F, Bjune G. The protective role of antibody responsesduring Mycobacterium tuberculosis infection. Clin Exp Immunol2009; 157: 235–243.

21 Cooper AM. Cell-mediated immune responses in tuberculosis.Annu Rev Immunol 2009; 27: 393–422.

22 Burl S, Hill PC, Jeffries DJ, Holland MJ, Fox A, Lugos MD et al.FOXP3 gene expression in a tuberculosis case contact study.Clin Exp Immunol 2007; 149: 117–122.

23 Shafiani S, Tucker-Heard G, Kariyone A, Takatsu K, UrdahlKB. Pathogen-specific regulatory T cells delay the arrival ofeffector T cells in the lung during early tuberculosis. J Exp Med2010; 207: 1409–1420.

24 Lund JM, Hsing L, Pham TT, Rudensky AY. Coordination ofearly protective immunity to viral infection by regulatory Tcells. Science 2008; 320: 1220–1224.

25 Searle SR, Casella GM, McCulloch CE. Variance components.Wiley-Interscience 1992.

26 Goeman JJ, Solari A. The sequential rejection principle offamilywise error control. Ann Statist 2010; 38: 3782–3810.

27 Meinshausen N. Hierarchical testing of variable importance.Biometrika 2008; 95: 265–278.

Supplementary Information accompanies the paper on Genes and Immunity website (http://www.nature.com/gene)

Biomarker identification using a novel dcRT–MLPA assaySA Joosten et al

82

Genes and Immunity