6
Genome plasticity of BCG and impact on vaccine efficacy Roland Brosch*, Stephen V. Gordon , Thierry Garnier*, Karin Eiglmeier* , Wafa Frigui*, Philippe Valenti* § , Sandrine Dos Santos*, Ste ´ phanie Duthoy*, Ce ´ line Lacroix*, Carmen Garcia-Pelayo , Jacqueline K. Inwald , Paul Golby , Javier Nun ˜ ez Garcia , R. Glyn Hewinson , Marcel A. Behr , Michael A. Quail , Carol Churcher , Bart G. Barrell , Julian Parkhill , and Stewart T. Cole* , ** *Unite ´ de Ge ´ne ´ tique Mole ´ culaire Bacte ´ rienne, Institut Pasteur, 28 Rue du Docteur Roux, 75724 Paris Cedex 15, France; Veterinary Laboratories Agency, Woodham Lane, New Haw, Addlestone, Surrey KT15 3NB, United Kingdom; McGill University Health Centre, Montreal, QC, Canada H3G 1A4; and The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom Communicated by G. Balakrish Nair, International Centre for Diarrhoeal Disease Research Bangladesh, Dhaka, Bangladesh, January 31, 2007 (received for review December 14, 2007) To understand the evolution, attenuation, and variable protective efficacy of bacillus Calmette–Gue ´ rin (BCG) vaccines, Mycobacte- rium bovis BCG Pasteur 1173P2 has been subjected to comparative genome and transcriptome analysis. The 4,374,522-bp genome contains 3,954 protein-coding genes, 58 of which are present in two copies as a result of two independent tandem duplications, DU1 and DU2. DU1 is restricted to BCG Pasteur, although four forms of DU2 exist; DU2-I is confined to early BCG vaccines, like BCG Japan, whereas DU2-III and DU2-IV occur in the late vaccines. The glycerol-3-phosphate dehydrogenase gene, glpD2, is one of only three genes common to all four DU2 variants, implying that BCG requires higher levels of this enzyme to grow on glycerol. Further amplification of the DU2 region is ongoing, even within vaccine preparations used to immunize humans. An evolutionary scheme for BCG vaccines was established by analyzing DU2 and other markers. Lesions in genes encoding -factors and pleiotropic tran- scriptional regulators, like PhoR and Crp, were also uncovered in various BCG strains; together with gene amplification, these affect gene expression levels, immunogenicity, and, possibly, protection against tuberculosis. Furthermore, the combined findings suggest that early BCG vaccines may even be superior to the later ones that are more widely used. glycerol metabolism live vaccines tandem duplications tuberculosis M ore than 3 billion individuals have been immunized with bacillus Calmette–Gue ´rin (BCG), the ‘‘Bacille de Calmette et Gue ´rin,’’ an attenuated derivative of Mycobacterium bovis (1). BCG is part of the WHO’s Expanded Program on Immunization because of its proven efficacy at preventing extrapulmonary tuberculosis in children (2). However, in adults, its efficacy against pulmonary disease is variable (3, 4), possibly as a result of environmental, operational, demographic, and genetic factors (5). For instance, prior exposure to environmen- tal mycobacteria severely compromises protection afforded by BCG (6), and this is inf luenced by the extent of cross-recognition of antigens shared with the vaccine (7). Another possible explanation for variable efficacy lies in the use of different daughter strains, and a brief reminder of their history is required (8–10). For 13 years, Calmette and Gue ´rin serially passaged their strain on potato slices imbibed with glycerol and monitored loss of virulence (1). Once safety had been confirmed, BCG was disseminated, and different labora- tories maintained their own daughter strains by passaging, until the introduction of archival seed lots in the 1960s. Since then, it has been recommended that vaccine preparations undergo no more than 12 passages from each seed lot (2). Thus, M. bovis BCG Pasteur 1173P2 corresponds to the archive established after 1,173 passages. Recently, the various daughter strains have been studied by comparative genomics (11–14), and this uncovered regions of difference (RD) such as deletions and insertions, plus some SNPs. BCG vaccines were thus divided into the early strains, represented by BCGs Japan, Birkhaug, Sweden, and Russia and the late strains, including BCGs Pasteur, Danish, Glaxo, and Prague (8). The most obvious reason for the attenuation of BCG was the loss of the protein secretion system ESX-1, absent from all strains, due to deletion of RD1 (15–20). However, because reintroduction of ESX-1 to BCG Pasteur or Russia does not restore full virulence (17), there are likely to be other lesions. Here, in an attempt to refine the genealogy of BCG, elucidate the basis of attenuation, and understand variable vaccine effi- cacy, we present the complete genome sequence of M. bovis BCG Pasteur 1173P2, details of its bioinformatic and functional- genomic analysis, and evidence for tandem duplications, DU1 and DU2. Results The Genome Sequence. By using gene prediction and genome comparison approaches (21, 22), a total of 3,954 genes coding for proteins (CDS) were identified in the 4,374,522-bp circular chromosome of BCG Pasteur, together with 34 pseudogenes (Fig. 1). Although the BCG genome has incurred several dele- tions since diverging from its parent M. bovis (11), it is none- theless almost 30 kb larger than that of M. bovis AF2122/97, which contains 4,345,492 bp (22), as a result of two independent tandem duplications, DU1 and DU2 (23). Consequently, BCG Pasteur is diploid for 58 CDS and two tRNA genes. There are 48 repetitive elements corresponding to insertion sequences and 13E12 repeats but none of the known prophages associated with M. tuberculosis (21, 24). Author contributions: R.B., S.V.G., and T.G. contributed equally to this work; R.B., S.V.G., K.E., R.G.H., C.C., B.G.B., J.P., and S.T.C. designed research; R.B., T.G., K.E., W.F., P.V., S.D.S., S.D., C.L., C.G.-P., J.K.I., P.G., J.N.G., M.A.B., and M.A.Q. performed research; R.B., S.V.G., T.G., K.E., J.N.G., R.G.H., M.A.B., M.A.Q., C.C., B.G.B., J.P., and S.T.C. analyzed data; and R.B., S.V.G., M.A.B., J.P., and S.T.C. wrote the paper. The authors declare no conflict of interest. Abbreviations: BCG, bacillus Calmette–Gue ´ rin; RD, regions of difference. Data deposition: The genome sequence reported in this paper has been deposited at EMBL (accession no. AM408590). Fully annotated microarray data have been deposited in BG@Sbase, http://bugs.sgul.ac.uk/E-BUGS-43 (accession no. E-BUGS-43) and ArrayExpress (accession no. E-BUGS-43). Present address: Unite ´ de Biochimie et de Biologie Mole ´ culaire des Insectes, Institut Pasteur, 75724 Paris Cedex, France. § Present address: Centre de Biologie du De ´ veloppement, Unite ´ Mixte de Recherche 5547, 31062 Toulouse, France. **To whom correspondence should be addressed. E-mail: [email protected]. This article contains supporting information online at www.pnas.org/cgi/content/full/ 0700869104/DC1. © 2007 by The National Academy of Sciences of the USA 5596 –5601 PNAS March 27, 2007 vol. 104 no. 13 www.pnas.orgcgidoi10.1073pnas.0700869104

Genome plasticity of BCG and impact on vaccine efficacystaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/G-Mycobacterium... · Genome plasticity of BCG and impact on vaccine efficacy

Embed Size (px)

Citation preview

Genome plasticity of BCG and impacton vaccine efficacyRoland Brosch*, Stephen V. Gordon†, Thierry Garnier*, Karin Eiglmeier*‡, Wafa Frigui*, Philippe Valenti*§,Sandrine Dos Santos*, Stephanie Duthoy*, Celine Lacroix*, Carmen Garcia-Pelayo†, Jacqueline K. Inwald†,Paul Golby†, Javier Nunez Garcia†, R. Glyn Hewinson†, Marcel A. Behr¶, Michael A. Quail�, Carol Churcher�,Bart G. Barrell�, Julian Parkhill�, and Stewart T. Cole*,**

*Unite de Genetique Moleculaire Bacterienne, Institut Pasteur, 28 Rue du Docteur Roux, 75724 Paris Cedex 15, France; †Veterinary Laboratories Agency,Woodham Lane, New Haw, Addlestone, Surrey KT15 3NB, United Kingdom; ¶McGill University Health Centre, Montreal, QC, Canada H3G 1A4; and�The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom

Communicated by G. Balakrish Nair, International Centre for Diarrhoeal Disease Research Bangladesh, Dhaka, Bangladesh, January 31, 2007(received for review December 14, 2007)

To understand the evolution, attenuation, and variable protectiveefficacy of bacillus Calmette–Guerin (BCG) vaccines, Mycobacte-rium bovis BCG Pasteur 1173P2 has been subjected to comparativegenome and transcriptome analysis. The 4,374,522-bp genomecontains 3,954 protein-coding genes, 58 of which are present intwo copies as a result of two independent tandem duplications,DU1 and DU2. DU1 is restricted to BCG Pasteur, although four formsof DU2 exist; DU2-I is confined to early BCG vaccines, like BCGJapan, whereas DU2-III and DU2-IV occur in the late vaccines. Theglycerol-3-phosphate dehydrogenase gene, glpD2, is one of onlythree genes common to all four DU2 variants, implying that BCGrequires higher levels of this enzyme to grow on glycerol. Furtheramplification of the DU2 region is ongoing, even within vaccinepreparations used to immunize humans. An evolutionary schemefor BCG vaccines was established by analyzing DU2 and othermarkers. Lesions in genes encoding �-factors and pleiotropic tran-scriptional regulators, like PhoR and Crp, were also uncovered invarious BCG strains; together with gene amplification, these affectgene expression levels, immunogenicity, and, possibly, protectionagainst tuberculosis. Furthermore, the combined findings suggestthat early BCG vaccines may even be superior to the later ones thatare more widely used.

glycerol metabolism � live vaccines � tandem duplications � tuberculosis

More than 3 billion individuals have been immunized withbacillus Calmette–Guerin (BCG), the ‘‘Bacille de

Calmette et Guerin,’’ an attenuated derivative of Mycobacteriumbovis (1). BCG is part of the WHO’s Expanded Program onImmunization because of its proven efficacy at preventingextrapulmonary tuberculosis in children (2). However, in adults,its efficacy against pulmonary disease is variable (3, 4), possiblyas a result of environmental, operational, demographic, andgenetic factors (5). For instance, prior exposure to environmen-tal mycobacteria severely compromises protection afforded byBCG (6), and this is influenced by the extent of cross-recognitionof antigens shared with the vaccine (7).

Another possible explanation for variable efficacy lies in theuse of different daughter strains, and a brief reminder of theirhistory is required (8–10). For 13 years, Calmette and Guerinserially passaged their strain on potato slices imbibed withglycerol and monitored loss of virulence (1). Once safety hadbeen confirmed, BCG was disseminated, and different labora-tories maintained their own daughter strains by passaging, untilthe introduction of archival seed lots in the 1960s. Since then, ithas been recommended that vaccine preparations undergo nomore than 12 passages from each seed lot (2). Thus, M. bovisBCG Pasteur 1173P2 corresponds to the archive establishedafter 1,173 passages.

Recently, the various daughter strains have been studied bycomparative genomics (11–14), and this uncovered regions of

difference (RD) such as deletions and insertions, plus someSNPs. BCG vaccines were thus divided into the early strains,represented by BCGs Japan, Birkhaug, Sweden, and Russia andthe late strains, including BCGs Pasteur, Danish, Glaxo, andPrague (8).

The most obvious reason for the attenuation of BCG was theloss of the protein secretion system ESX-1, absent from allstrains, due to deletion of RD1 (15–20). However, becausereintroduction of ESX-1 to BCG Pasteur or Russia does notrestore full virulence (17), there are likely to be other lesions.Here, in an attempt to refine the genealogy of BCG, elucidatethe basis of attenuation, and understand variable vaccine effi-cacy, we present the complete genome sequence of M. bovis BCGPasteur 1173P2, details of its bioinformatic and functional-genomic analysis, and evidence for tandem duplications, DU1and DU2.

ResultsThe Genome Sequence. By using gene prediction and genomecomparison approaches (21, 22), a total of 3,954 genes coding forproteins (CDS) were identified in the 4,374,522-bp circularchromosome of BCG Pasteur, together with 34 pseudogenes(Fig. 1). Although the BCG genome has incurred several dele-tions since diverging from its parent M. bovis (11), it is none-theless almost 30 kb larger than that of M. bovis AF2122/97,which contains 4,345,492 bp (22), as a result of two independenttandem duplications, DU1 and DU2 (23). Consequently, BCGPasteur is diploid for 58 CDS and two tRNA genes. There are48 repetitive elements corresponding to insertion sequences and13E12 repeats but none of the known prophages associated withM. tuberculosis (21, 24).

Author contributions: R.B., S.V.G., and T.G. contributed equally to this work; R.B., S.V.G.,K.E., R.G.H., C.C., B.G.B., J.P., and S.T.C. designed research; R.B., T.G., K.E., W.F., P.V., S.D.S.,S.D., C.L., C.G.-P., J.K.I., P.G., J.N.G., M.A.B., and M.A.Q. performed research; R.B., S.V.G.,T.G., K.E., J.N.G., R.G.H., M.A.B., M.A.Q., C.C., B.G.B., J.P., and S.T.C. analyzed data; and R.B.,S.V.G., M.A.B., J.P., and S.T.C. wrote the paper.

The authors declare no conflict of interest.

Abbreviations: BCG, bacillus Calmette–Guerin; RD, regions of difference.

Data deposition: The genome sequence reported in this paper has been deposited at EMBL(accession no. AM408590). Fully annotated microarray data have been deposited inB�G@Sbase, http://bugs.sgul.ac.uk/E-BUGS-43 (accession no. E-BUGS-43) and ArrayExpress(accession no. E-BUGS-43).

‡Present address: Unite de Biochimie et de Biologie Moleculaire des Insectes, InstitutPasteur, 75724 Paris Cedex, France.

§Present address: Centre de Biologie du Developpement, Unite Mixte de Recherche 5547,31062 Toulouse, France.

**To whom correspondence should be addressed. E-mail: [email protected].

This article contains supporting information online at www.pnas.org/cgi/content/full/0700869104/DC1.

© 2007 by The National Academy of Sciences of the USA

5596–5601 � PNAS � March 27, 2007 � vol. 104 � no. 13 www.pnas.org�cgi�doi�10.1073�pnas.0700869104

Comparative Genomics. Considerable insight into the evolution oftubercle bacilli has been obtained from studying polymorphismslike RD (25–28). On comparison of the genome sequences of M.tuberculosis strains H37Rv and CDC1551 (29) with those of M.bovis AF2122/97 and BCG Pasteur, 42 RD were uncovered, 28of which had been detected previously [Fig. 1; and see supportinginformation (SI) Table 2]. These affect �170 genes, of whichBCG Pasteur has lost 133. Of the 14 new RD, 1 is intergenic, 11affect PE�PGRS or PPE genes, and another corresponds toamplification of a 57-bp tandem repeat in leuA. The finding thatBCG Pasteur contains RD17 and RDpan, whereas M. bovisAF2122/97 does not, is consistent with the scheme in which theparental M. bovis strain preceded M. bovis AF2122/97, as borneout by spoligotyping (27).

On inspection of the complete SNP catalog (SI Table 3), it wasfound that there were only 736 SNP between the two M. bovisstrains, indicating a close relationship, and �2,400 SNP betweenBCG and the M. tuberculosis strains, which also appear moredivergent than the bovine strains. The majority of the SNPbetween M. bovis and BCG Pasteur occur within genes (83%),and 68% of these are nonsynonymous, consistent with thedelayed effects of purifying selection on recent mutations (30).

Tandem Duplications in BCG Pasteur. The most prominent genomicpolymorphisms are the tandem duplications, DU1 and DU2.Both copies of DU1 are identical, whereas there is a singlenonsynonymous base difference in DU2, within gene BCG3258.DU1 is 29,667 bp in length and spans the chromosomal origin ofreplication, oriC. As can be seen from Fig. 2, this duplication isrestricted to BCG Pasteur, unlike DU2, which occurs in one of

four different forms, DU2-I–DU2-IV, in all daughter BCGstrains examined. In BCG Pasteur, DU2-IV, comprising 36,163bp, arose as a result of tandem duplication of a 141-kb stretchand two subsequent internal deletions, one of which (�int) iscommon to BCG groups DU2-II–DU2-IV. Several regulatorygenes, including sigH and whiB1, occur in DU2-IV that mightaffect gene expression levels when diploid.

DU2 in Other BCG Strains. Detailed mapping of representative earlyand late BCG daughter strains was performed by pulsed-field gelelectrophoresis and locus-specific Southern blotting. DU2-IV ofBCG Pasteur is contained entirely within an AsnI fragment thatspans region 3,529–3,753 kb (M. tuberculosis H37Rv coordi-nates); consequently, a change in its size reflects internal am-plification events. As shown in Fig. 3, the corresponding AsnIfragments of the BCG strains are all larger than those of M.tuberculosis and M. bovis by �80 kb in BCGs Prague, Merieux,and Danish, by �75 kb in BCGs Sweden and Birkhaug, and by�20 or 40 kb in BCG Moreau and BCGs Japan and Russia,respectively.

The endpoints of the DU2 duplications in the various BCGstrains were pinpointed by mapping, cloning, and sequencing byusing BAC clones. In the early strains BCGs Moreau, Russia, andJapan, the �20-kb duplicated segment corresponds to positions3,684,229–3,704,932 (M. tuberculosis H37Rv coordinates), andthis duplication was named DU2-I. There was extensive evidenceof amplification of DU2-I because BCGs Russia and Japan havepredominantly three copies, whereas BCG Moreau has two (Fig.3 and SI Fig. 6), as summarized in Fig. 4A. PCR assays for thejunction, JDU2-I, demonstrated the presence of DU2-I in allthree strains but not in the remaining 11 BCG vaccines.

BCGs Birkhaug and Sweden displayed identical hybridizationpatterns but their DU2 endpoints differed from those of othergroups. Furthermore, the duplicated segment harboring �int hadswitched position relative to groups III and IV. This configurationwas termed DU2-II (Fig. 4A). In the case of BCGs Merieux,Prague, and Danish, the duplicated fragment bore 78.5 kb of extraDNA, corresponding to regions 3,567,459–3,608,472 and3,671,536–3,709,097 (M. tuberculosis H37Rv coordinates). This

Fig. 1. Circular representation of the M. bovis BCG Pasteur chromosome. Thescale is shown in megabases in the outer black circle. Moving inward, the nexttwo circles show forward and reverse strand CDS, respectively, with colorsrepresenting the functional classification (red, replication; light blue, regula-tion; dark blue, virulence; light green, hypothetical protein; dark green, cellwall and cell processes; orange, conserved hypothetical protein; cyan, ISelements; yellow, intermediate metabolism; gray, lipid metabolism; purple,PE/PPE). The following two circles show forward and reverse strand pseudo-genes (colors represent the functional classification), the next circle shows RD(black) and DU (red), followed by the G�C content, and finally the GC skew(G-C)/(G�C) plotted by using a 10-kb window. For more details see SI Table 2.

Fig. 2. Mapping duplications in BCG strains. DU1 is confined to M. bovis BCGPasteur as shown by Southern blotting of HindIII restriction digests of variousBCG vaccines and hybridization with a probe for the oriC region. Note that thedifference in size of the fragment hybridizing in M. tuberculosis (M. tub.)H37Ra is due to an IS6110 insertion in the fragment.

Brosch et al. PNAS � March 27, 2007 � vol. 104 � no. 13 � 5597

MIC

ROBI

OLO

GY

duplication, termed DU2-III, is a precursor to DU2-IV of BCGPasteur, which has �int but shows a different duplication junctionbecause of the second deletion event (Fig. 4A and SI Fig. 6).

Systematic PCR screening with appropriate JDU2 primers (SITable 4) classed each of the 14 BCG vaccines into one of fourgroups (Fig. 5). In this scheme, DU2-II–DU2-IV are closelyrelated, whereas DU2-I is quite distinct. Early BCG strains withDU2-I have two or three copies of genes BCG3365–BCG3383,whereas late vaccines harboring DU2-III are diploid or triploidfor genes BCG3221c–BCG3260c and BCG3356c–BCG3388c.

Amplification Is Ongoing. Because gene amplification might alterimmunogenicity and vaccine efficacy, we monitored currentvaccine lots for triplications. A most striking example was foundin the widely used vaccine BCG Danish 1331. A sample growndirectly from a vial used in immunization programs was found tobe positive for JDU2-III (Fig. 4B) and to contain AsnI fragmentsthat were 80 or 160 kb larger than their counterparts in M.tuberculosis and M. bovis (Fig. 3), indicating that cells with eitherduplication or triplication of DU2-III coexisted in the popula-tion. In addition, we also investigated a BCG Danish strain thathad been grown continuously for 1,513 passages and found onlythe triplicated form of DU2-III (Fig. 3 and SI Fig. 6). Theseresults indicate that copy number may increase with additionalpassaging.

What Drives Amplification? To find clues to the selective pressure,which led to tandem duplications, the region of overlap betweenDU2-I and DU2-IV was compared and found to comprise amere 5,899 bp and contain only three intact genes: Rv3300c,encoding a member of the RNA pseudouridylate synthasefamily; phoY1, coding for a phosphate transport system regula-tor, and glpD2, encoding glycerol-3-phosphate dehydrogenase(Fig. 4B). Higher levels of the latter enzyme likely afforded anadvantage to strains with duplications for growth in Calmette’sglycerol-containing medium, and this is supported by the 2.7-foldincrease in glpD2 expression levels in BCG strains comparedwith M. bovis (Table 1). Furthermore, because glycerol is stillused in the medium for vaccine production, it is possible thatamplification of the glpD2 region enhances the growth rate.

Fig. 3. Genomic variations occur in vaccine preparations intended for humanuse. Variation in the DU2 region revealed by Southern blotting of AsnI restrictiondigestsandhybridizationwithaprobefor the3,686-kbregion.Notethat theAsnIsites in the DU2 region are outside the duplicated region and that additionalhybridizing bands are due to ongoing amplification events, such as triplications.

Fig. 4. Scheme showing the appearance of DU2-I through DU2-IV. (A)Duplicated regions use a color scheme, and each duplication is boxed.Genomic coordinates based on M. tuberculosis H37Rv are indicated togetherwith the positions of junctions (JDU2-I–JDU2-IV). (B). Identification of genespresent in the region common to DU2-I through DU2-IV. Note the colorscheme of A also applies to B.

Fig. 5. Refined genealogy of BCG vaccines. The scheme shows the positionof genetic markers identified in this work, RD markers, some strain-specificdeletions, and the distribution of vaccines into the four groups. Details ofprimers used for differentiation are listed in SI Table 4.

5598 � www.pnas.org�cgi�doi�10.1073�pnas.0700869104 Brosch et al.

Gene Regulation and Phenotypic Differences. Initial analysis ofdiversity in BCG revealed overrepresentation of regulatorygenes in the RD (11). This trend was extended on analysis of thegenome sequence because 3 of the 10 extracytoplasmic function(ECF) �-factor genes (31) have been lost or inactivated. As aresult of the N-RD18 deletion, the 5� end of sigI (Rv1189) hasbeen fused in-frame to the 3� end of the Rv1191 ortholog. Theresultant fusion protein is unlikely to function because thepromoter recognition domain is located in the C-terminal partof ECF �-factors. The sigI gene is intact in the other three BCGgroups (Fig. 5).

The sigK gene of BCG Pasteur has incurred a missensemutation in its start codon that replaces the ATG by ATA. Thismutation is present in all BCG strains except the early ones (Fig.5). One of the consequences of this mutation is loss of expressionof the major antigens MPB70 and MPB83 (32). Another inter-esting regulatory locus is that of the two-component system,PhoP-PhoR, as lesions in phoP in M. tuberculosis result in markedattenuation and a profoundly altered cell envelope (33, 34).Here, DU2-I strains differ from the others as they have anIS6110 element located upstream of phoP and this may influenceexpression levels as more phoP mRNA was detected in BCGTokyo than BCG Pasteur (SI Table 5). This element wassubsequently lost (Fig. 5).

Importantly, a 10-bp deletion was found in codon 91 of phoRfrom the DU2-III strains BCG Glaxo, Merieux, and Danish, thatwould disrupt expression (Fig. 5). Inactivation of the PhoP-PhoRsystem leads to loss of diacyltrehaloses, polyacyltrehaloses, andsulfolipids (33, 35), all of which are missing from BCG, althoughthe genes encoding the corresponding biosynthetic machineryare identical to those of M. bovis. Although the lesion in phoRmight account for their absence from the DU2-III strains, it doesnot explain their loss from the other BCG vaccines, thus raisingthe possibility that other regulatory genes intervene. However,it is noteworthy that the DU2-III strains are considered to be themost attenuated.

In most bacteria, but possibly not M. tuberculosis (35), thePhoP-PhoR system generally regulates genes involved in phos-phate metabolism, and genome analysis predicts that the high-affinity phosphate-uptake system may be inactive in BCG,because both the pstB and phoT genes, encoding key compo-nents, are frameshifted (36). BCG also lacks the alternativesystem for capturing phosphate, uptake of sn-glycerol-3-

phosphate via the ugpABC system, because ugpB has a frameshiftmutation. Intriguingly, although M. bovis AF2122/97 has anintact ugpB, it is mutated in ugpA, which is functional in BCG.Overall, BCG may be challenged for growth under conditionswhere phosphate concentrations are limiting, and, in vivo thiswould constitute a distinct growth disadvantage.

Another regulatory locus that has accrued down-regulatingmutations is BCG3734, encoding the cAMP-receptor protein,Crp. Once again, there are variations between different BCGdaughter strains (Fig. 5) because, although they all have anE178K substitution affecting the DNA-binding domain, the‘‘late’’ strains also have the L47P replacement in the cAMP-binding site (37). Because Rv3676, the M. tuberculosis ortholog,mediates responses to low glucose concentrations and nutrientstarvation, loss of BCG3734 function might perturb intermedi-ary metabolism and growth under microaerophilic conditions.

Comparative Transcriptomics: Impact on Virulence and Metabolism.To explore global gene expression differences, we performedcomparative in vitro transcriptome analysis across the early and lateBCGs, Japan and Pasteur, respectively, versus two M. bovis strainsand highlighted a subset of genes showing significant differences inboth comparisons (Table 1, SI Table 5, and SI Fig. 7).

A key selective pressure during in vitro adaptation of the M.bovis progenitor was the switch to a glycerol carbon source,evidenced by the presence of a functional pyruvate kinaseenzyme in BCG (38) and significantly higher levels of transcrip-tion of glpD2 compared with M. bovis (Table 1). Furtherconfirmation of metabolic remodeling is seen in the divergentregulation of genes associated with fatty acid degradation (SITable 5). Hence, fadD2, fadE35, and the �-oxidation complexgenes fadAB, are all down-regulated in BCG. Likewise, thedesA1 and desA3 genes, encoding two desaturases involved infatty acid modification (39), were down-regulated in BCGcompared with M. bovis. Expression of desA3 is at least 12-foldhigher in vitro in M. bovis compared with both BCG strains(Table 1). Inactivation of desA3 attenuates M. tuberculosis (40),and increased expression of desA1 and desA3 occurs in patientswith active tuberculosis (41). These observations suggest a keyrole for these desaturases in virulence, so their decreasedexpression in BCG may be relevant to attenuation. Reducedexpression of desA3 is surprising because it is located in DU2-IV(23); other DU2-IV genes showed �2-fold-increased relative

Table 1. Selected gene expression data, including DU2

Systematicname Mb

Systematic nameBCG

Commonname BCG Pasteur* BCG Japan M. bovis 1121/01 M. bovis AF2122/97

Mb0674 BCG0704 mceG (mkl) 2.2 (1.4–3.3) 2.9 (1.8–4.3) 23.8 (19.5–29.6) 16.7 (9.3–31.3)Mb0847c BCG0877c desA1 5.5 (1.9–13.1) 6.1 (3.8–9.3) 24.4 (15.9–39.3) 14.67 (2.0–32.9)Mb3245 BCG3246/BCG3339 whiB1 32.3 (16.7–58.1) 22.9 (12.4–40.3) 7.2 (3.5–11.1) 8.8 (5.3–15.8)Mb3250c BCG3251/BCG3344 sigH 7.2 (2.7–11.6) 3.0 (2.2–3.9) 1.8 (1.2–6.1) 1.8 (0.7–2.8)Mb3251 BCG3252/BCG3345 Rv3224 7.9 (3.3–11.5) 4.2 (2.7–6.7) 4.0 (1.5–5.3) 5.1 (81.8–7.9)Mb3258c BCG3259/BCG3352 desA3 1.1 (0.7–1.9) 1.4 (1.1–1.8) 18.9 (3.9–35.1) 18.2 (5.2–42.9)Mb3259c BCG3260/BCG3353 Rv3230c 0.7 (0.4–1.0) 0.6 (0.4–0.8) 2.9 (1.9–4.1) 2.5 (1.2–5.3)Mb3327c BCG3328/BCG3364 atsB 1.4 (0.8–1.9) 3.4 (2.4–4.4) 0.8 (0.6–1.8) 0.7 (0.3–1.1)Mb3328c BCG3329/BCG3365 Rv3300c 1.0 (0.5–2.8) 1.6 (1.1–2.5) 0.8 (0.4–1.1) 0.8 (0.2–2.9)Mb3329c BCG3330/BCG3366 phoY1 2.1 (1.3–2.6) 2.0 (1.0–2.6) 0.8 (0.5–1.3) 1.0 (0.5–1.8)Mb3330c BCG3331/BCG3367 glpD2 4.5 (2.8–5.8) 2.7 (1.9–3.6) 1.1 (0.8–1.5) 1.2 (0.7–1.9)Mb3331c BCG3332/BCG3368 lpdA 1.8 (1.3–2.3) 1.9 (1.3–2.6) 0.3 (0.2–0.5) 0.4 (0.2–0.6)Mb3332 BCG3333/BCG3369 Rv3304 0.7 (0.4–1.2) 2.4 (1.4–3.3) 0.4 (0.2–0.5 0.35 (0.01–0.6)Mb3336 BCG3373 pmmB 0.4 (0.01–0.7) 1.9 (1.5–2.5) 0.4 (0.3–0.5) 0.5 (0.2-0.8)Mb3345 BCG3382 sdhC 1.1 (0.7–1.9) 2.8 (1.9–3.8) 1.5 (0.8–14.0) 1.1 (0.6–2.3)Mb3346 BCG3383 sdhD 1.2 (0.5–2.1) 4.4 (2.6–7.5) 1.6 (1.1–2.0) 1.2 (0.5–2.32)Mb3450 BCG3486 whiB3 8.8 (3.6–14.7) 1.0 (0.5–1.4) 0.6 (0.4–1.3) 0.6 0.2–3.2)

*Values shown in boldface are more than twice those of the comparator group.

Brosch et al. PNAS � March 27, 2007 � vol. 104 � no. 13 � 5599

MIC

ROBI

OLO

GY

expression, as expected for diploidy, and the same trend wasobserved with DU2-I genes (Table 1).

In line with the mutation of transcriptional activators, expres-sion of multiple regulators was divergent between M. bovis andBCG (SI Table 5). Genes Mb0484, Mb0846c, Mb1433c, Mb2354,lexA, Mb3122, and Mb3614c were all at least 3-fold down-regulated in both BCG strains, whereas Mb3277 showed in-creased expression. Interestingly, three WhiB family transcrip-tion factors, which regulate virulence, cell division, and stressresponses (41–44), were differentially expressed. In both BCGs,whiB4 was down-regulated, whereas whiB1 was overexpressedbecause of increased gene dosage; whiB3 showed 9-fold in-creased expression in BCG Pasteur over BCG Japan and M. bovis(Table 1).

One of the genes showing the greatest difference in expressionbetween BCG and M. bovis is mceG, which is essential for thefunction of multiple mce loci involved in cell entry (45). Mutationof mceG had the same effect on in vivo growth as the simulta-neous inactivation of both mce1 and mce4 loci, resulting in severeattenuation of M. tuberculosis (45). Our results show that mceGis highly repressed in BCG compared with M. bovis (Table 1),providing another possible explanation for attenuation of thevaccine.

Variable expression of genes encoding immunogenic surfaceand secreted proteins was revealed. Seven PE and PPE familymembers (pe-pgrs3, pe-pgrs5, pe-pgrs15, pe13, ppe18, ppe40, andpe-pgrs54) had reduced expression in BCG; ppe50 was up-regulated in both BCG strains, with ppe27 and pe19 up-regulatedin BCG Pasteur versus BCG Japan. Expression of the serodomi-nant Mpb83 and Mpb70 antigens (32) was repressed in BCGPasteur, whereas five ESAT-6 genes (esxI, esxJ, esxK, esxM, andesxN) showed increased expression compared with BCG Japan(SI Table 5). Indeed, the cumulative effect of variation insecreted and cell wall antigens across BCG strains may contrib-ute to variable vaccine efficacy.

DiscussionA large body of evidence is presented for diversity among BCGstrains, and this stems from both genomic modifications andalterations to pleiotropic regulators of gene expression. We postu-late that the resultant phenotypic differences contribute to variablevaccine efficacy. When Calmette and Guerin grew the progenitorM. bovis strain on glycerinated potato slices, they unknowinglyimposed selective pressure for genetic alterations to this naturalmutant for glycerol metabolism, so that glycerol could be used asa carbon and energy source. Some of these changes were simplepoint mutations, such as selection for an active pyruvate kinase (22,38), whereas others, like tandem duplications, were more complex,with potentially far-reaching effects because of their size andinstability. As part of the metabolic improvement, one directconsequence of DU2 amplifications was higher levels of glycerol-3-phosphate dehydrogenase, another key enzyme required forgrowth on glycerol, as confirmed by transcriptomics.

A study of the transcriptome also revealed extensive variation ingene expression both between early and late BCG daughter strainsand with respect to virulent tubercle bacilli. This results fromincreased gene dosage or from altered activity of pleiotropicregulators leading to over- or underproduction of certain proteins,including virulence factors and enzymes. Consequently, the corre-sponding vaccine strain may have an imbalanced antigenic reper-toire that does not accurately reflect that of bovine or humantubercle bacilli. As can be seen from SI Table 5, there are extensivedifferences in the level of expression of known surface proteins andimmunodominant protein antigens between BCGs Pasteur andJapan that may induce protective responses.

Tandem gene duplications provide increased activity in responseto strong selective pressure. From the known BCG chronology (9),it appears that DU2-I arose in prototypical BCG and was then lost

before the emergence of the DU2-II precursor, which later incurred�int. This, in turn, served as a substrate for further duplicationevents in vaccines of groups II, III, and IV (Fig. 5). If our model iscorrect, DU2 duplication arises as a function of growth rate onglycerol. Gene amplification, and the subsequent diversity in cel-lular populations and immunogenicity, could be avoided by pro-ducing the vaccine in a more controlled way, by using toolsdeveloped here for quality control and assurance, because themanufacturing process has scarcely changed since 1921. Theseinnovations will also be useful for monitoring recombinant BCGvaccines.

From Fig. 5 it seems that, after 1925, Calmette’s team may havereplaced the strain initially distributed (now represented by BCGJapan) by another derivative, possibly less virulent or reactogenic.The later strains fall into three groups of which DU2-II is inter-mediate and followed by the major vaccine-production strains BCGDanish, Glaxo, and Pasteur. These three accounted for 66% of the335 million doses administered in 1996 (2) and all of the meta-analyses of BCG vaccine efficacy use data sets from trials per-formed with them (3). We note that, in a comparative study ofimmune responses in babies, BCG Japan induced significantlyhigher levels of Th1-cytokines (IFN-�, TNF-�, IL-2) and lowersecretion of the Th2-cytokine, IL-4, accompanied by greater CD4and CD8 T cell proliferation, than did BCG Danish (46). Takentogether with our findings, this suggests that early BCG vaccinesmay confer better protection against tuberculosis, a possibility thatwould benefit from formal evaluation in clinical trials.

Materials and MethodsBacteria, Genome Mapping, and Sequencing. BCG vaccines werefrom the collections at the Institut Pasteur or McGill University,and grown in Middlebrook 7H9 medium (Difco) with 0.05% Tween80 and Albumin Dextrose Catalase (ADC) supplement. An initialshotgun of BCG Pasteur was generated from �21,700 paired-endsequences from three pUC19 libraries with insert sizes ranging from2.8 to 5.5 kb, 7,500 paired-end sequences from two pMAQ1blibraries with insert sizes of 5.5–10 kb, and 3,500 single-end se-quences from an M13 library. This produced an initial 4-foldcoverage of the genome. This was supplemented with 30,000 readsobtained from a whole-genome shotgun library prepared in thevector pCDNA2.1 in Paris. Plasmids were sequenced by using ABI3700 DNA sequencers (PerkinElmer, Foster City, CA), and se-quences assembled by using PHRAP and GAP4 as described (21).Annotation and database presentation were by means of Artemis(47) and BCGList (http://genolist.pasteur.fr/BCGList/). GenomicDNA analysis by pulsed-field gel electrophoresis and BAC libraryconstruction were as described (23, 48, 49).

Transcriptome Analysis. M. bovis AF2122/97 (GB spoligotype 9,international code SB0140 as defined at www.mbovis.org), M. bovis1121/01 (GB spoligotype 17, the second most frequent isolate in theU.K.; intl. code SB0263), M. bovis BCG Pasteur and M. bovis BCGJapan were grown as above. Total RNA from each strain wasextracted, purified, reverse-transcribed, and labeled with Cy5-dCTP as described [(50) http://www.bugs.sgul.ac.uk/bugsbase/].Cy3-labeled DNA (AF2122/97) was used for control purposes.Probes were hybridized to whole-genome M. bovis/M. tuberculosiscomposite microarrays (TbV2; St. Georges Hospital, U.K.), thearrays were scanned with an Affymetrix 428 scanner, images wereanalyzed with Imagene 5.0, and median spot intensities werecalculated by using Genespring 7.0 software (Silicon Genetics,Redwood City, CA). Significance values were calculated by usingone-way ANOVA, followed by a Student–Newman–Keuls post hoctest. Final P values were obtained by Benjamini and Hochbergcorrection for multiple testing with a false discovery rate of 5%.Results for selected genes were confirmed by real-time RT-PCRanalysis, and expression levels were normalized by using the sigAgene as an internal reference.

5600 � www.pnas.org�cgi�doi�10.1073�pnas.0700869104 Brosch et al.

Accession Numbers. The genome sequence has been deposited atEMBL under accession no. AM408590; fully annotated mi-croarray data have been deposited in B�G@Sbase (accessionno. E-BUGS-43; http://bugs.sgul.ac.uk/E-BUGS-43) andArrayExpress (accession no. E-BUGS-43).

We thank Drs. K. Haslov, I. Kromann, (both at Statens Serum Institutet,Copenhagen, Denmark), G. Marchal (Institut Pasteur), and S.

Yamamoto [National Institute of Health (NIH), Tokyo, Japan] forproviding strains and the shotgun sequencing team from the SangerInstitute for generating reads. This work was supported by the Depart-ment for Environment, Food, and Rural Affairs (DEFRA, GB), TheWellcome Trust, the Association Francaise Raoul Follereau, and theInstitut Pasteur. We acknowledge B�G@S (the Bacterial MicroarrayGroup at St. George’s, University of London) for supply of the microar-ray and advice and The Wellcome Trust for funding the multicollabo-rative microbial pathogen microarray facility under its FunctionalGenomics Resources Initiative.

1. Calmette A (1927) La Vaccination Preventive Contre la Tuberculose (Masson,Paris).

2. Fine PEM, Carneiro IAM, Milstien JB, Clemens CJ (1999) Issues Relating tothe Use of BCG in Immunization Programmes. A Discussion Document (WorldHealth Organization, Geneva).

3. Colditz GA, Berkey CS, Mosteller F, Brewer TF, Wilson ME, Burdick E,Fineberg HV (1995) Pediatrics 96:29–35.

4. Fine PEM (1995) Lancet 346:1339–1345.5. Bloom BR, Fine PEM (1994) in Tuberculosis: Pathogenesis, Protection, and

Control, ed Bloom BR (Am Soc Microbiol, Washington, DC), pp 531–557.6. Brandt L, Cunha JF, Olsen AW, Chilima B, Hirsch P, Appelberg R, Andersen

P (2002) Infect Immun 70:672–678.7. Demangel C, Garnier T, Rosenkrands I, Cole ST (2005) Infect Immun

73:2190–2196.8. Brosch R, Behr MA (2005) in Tuberculosis and the Tubercle Bacillus, eds Cole

ST, Eisenach KD, McMurray DN, Jacobs WR, Jr (Am Soc Microbiol,Washington, DC), pp 155–164.

9. Behr MA, Small PM (1999) Vaccine 17:915–922.10. Oettinger T, Joergensen M, Ladefoged A, Hasloev K, Andersen P (1999)

Tubercle Lung Dis 79:243–250.11. Behr MA, Wilson MA, Gill WP, Salamon H, Schoolnik GK, Rane S, Small PM

(1999) Science 284:1520–1523.12. Gordon SV, Brosch R, Billault A, Garnier T, Eiglmeier K, Cole ST (1999) Mol

Microbiol 32:643–656.13. Belley A, Alexander D, Di Pietrantonio T, Girard M, Jones J, Schurr E, Liu

J, Sherman DR, Behr MA (2004) Infect Immun 72:2803–2809.14. Mostowy S, Tsolaki AG, Small PM, Behr MA (2003) Vaccine 21:4270–4274.15. Mahairas GG, Sabo PJ, Hickey MJ, Singh DC, Stover CK (1996) J Bacteriol

178:1274–1282.16. Guinn KM, Hickey MJ, Mathur SK, Zakel KL, Grotzke JE, Lewinsohn DM,

Smith S, Sherman DR (2004) Mol Microbiol 51:359–370.17. Pym AS, Brodin P, Brosch R, Huerre M, Cole ST (2002) Mol Microbiol

46:709–717.18. Hsu T, Hingley-Wilson SM, Chen B, Chen M, Dai AZ, Morin PM, Marks CB,

Padiyar J, Goulding C, Gingery M, et al. (2003) Proc Natl Acad Sci USA100:12420–12425.

19. Lewis KN, Liao R, Guinn KM, Hickey MJ, Smith S, Behr MA, Sherman DR(2003) J Infect Dis 187:117–123.

20. Stanley SA, Raghavan S, Hwang WW, Cox JS (2003) Proc Natl Acad Sci USA100:13001–13006.

21. Cole ST, Brosch R, Parkhill J, Garnier T, Churcher C, Harris D, Gordon SV,Eiglmeier K, Gas S, Barry CE, III, et al. (1998) Nature 393:537–544.

22. Garnier T, Eiglmeier K, Camus J-C, Medina M, Mansoor H, Pryor M, DuthoyS, Grondin S, Lacroix C, Monsempe C, et al. (2003) Proc Natl Acad Sci USA100:7877–7882.

23. Brosch R, Gordon SV, Buchrieser C, Pym A, Garnier T, Cole ST (2000) CompFunct Genomics (Yeast) 17:111–123.

24. Gordon SV, Heym B, Parkhill J, Barrell B, Cole ST (1999) Microbiology145:881–892.

25. Brosch R, Gordon SV, Marmiesse M, Brodin P, Buchrieser C, Eiglmeier K,Garnier T, Gutierrez C, Hewinson G, Kremer K, et al. (2002) Proc Natl AcadSci USA 99:3684–3689.

26. Mostowy S, Inwald J, Gordon S, Martin C, Warren R, Kremer K, Cousins D,Behr MA (2005) J Bacteriol 187:6386–6395.

27. Smith NH, Dale J, Inwald J, Palmer S, Gordon SV, Hewinson RG, Smith JM(2003) Proc Natl Acad Sci USA 100:15271–15275.

28. Smith NH, Kremer K, Inwald J, Dale J, Driscoll JR, Gordon SV, van SoolingenD, Hewinson RG, Smith JM (2006) J Theor Biol 239:220–225.

29. Fleischmann RD, Alland D, Eisen JA, Carpenter L, White O, Peterson J,DeBoy R, Dodson R, Gwinn M, Haft D, et al. (2002) J Bacteriol 184:5479–5490.

30. Rocha EP, Smith JM, Hurst LD, Holden MT, Cooper JE, Smith NH, Feil EJ(2006) J Theor Biol 239:226–235.

31. Helmann JD (2002) Adv Microb Physiol 46:47–110.32. Charlet D, Mostowy S, Alexander D, Sit L, Wiker HG, Behr MA (2005) Mol

Microbiol 56:1302–1313.33. Gonzalo Asensio J, Maia C, Ferrer NL, Barilone N, Laval F, Soto CY, Winter

N, Daffe M, Gicquel B, Martin C, Jackson M (2006) J Biol Chem 281:1313–1316.

34. Perez E, Samper S, Bordas Y, Guilhot C, Gicquel B, Martin C (2001) MolMicrobiol 41:179–187.

35. Walters SB, Dubnau E, Kolesnikova I, Laval F, Daffe M, Smith I (2006) MolMicrobiol 60:312–330.

36. Collins DM, Kawakami RP, Buddle BM, Wards BJ, de Lisle GW (2003)Microbiology 149:3203–3212.

37. Spreadbury CL, Pallen MJ, Overton T, Behr MA, Mostowy S, Spiro S, BusbySJ, Cole JA (2005) Microbiology 151:547–556.

38. Keating LA, Wheeler PR, Mansoor H, Inwald JK, Dale J, Hewinson RG,Gordon SV (2005) Mol Microbiol 56:163–174.

39. Makinoshima H, Glickman MS (2005) Nature 436:406–409.40. Sassetti CM, Rubin EJ (2003) Proc Natl Acad Sci USA 100:12989–12994.41. Rachman H, Strong M, Ulrichs T, Grode L, Schuchhardt J, Mollenkopf H,

Kosmiadi GA, Eisenberg D, Kaufmann SH (2006) Infect Immun 74:1233–1242.

42. Gomez JE, Bishai WR (2000) Proc Natl Acad Sci USA 97:8554–8559.43. Morris RP, Nguyen L, Gatfield J, Visconti K, Nguyen K, Schnappinger D, Ehrt

S, Liu Y, Heifets L, Pieters J, et al. (2005) Proc Natl Acad Sci USA 102:12200–12205.

44. Steyn AJ, Collins DM, Hondalus MK, Jacobs WR, Jr, Kawakami RP, BloomBR (2002) Proc Natl Acad Sci USA 99:3147–3152.

45. Joshi SM, Pandey AK, Capite N, Fortune SM, Rubin EJ, Sassetti CM (2006)Proc Natl Acad Sci USA 103:11760–11765.

46. Davids V, Hanekom WA, Mansoor N, Gamieldien H, Gelderbloem SJ,Hawkridge A, Hussey GD, Hughes EJ, Soler J, Murray RA, et al. (2006) J InfectDis 193:531–536.

47. Rutherford K, Parkhill J, Crook J, Horsnell T, Rice P, Rajandream M-A,Barrell B (2000) Bioinformatics 16:944–945.

48. Brosch R, Gordon SV, Billault A, Garnier T, Eiglmeier K, Soravito C, BarrellBG, Cole ST (1998) Infect Immun 66:2221–2229.

49. Philipp WJ, Nair S, Guglielmi G, Lagranderie M, Gicquel B, Cole ST (1996)Microbiology 142:3135–3145.

50. Stewart GR, Wernisch L, Stabler R, Mangan JA, Hinds J, Laing KG, YoungDB, Butcher PD (2002) Microbiology 148:3129–3138.

Brosch et al. PNAS � March 27, 2007 � vol. 104 � no. 13 � 5601

MIC

ROBI

OLO

GY