Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Nature © Macmillan Publishers Ltd 1998
8
letters to nature
190 NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com
Received 7 July; accepted 21 September 1998.
1. Dalbey, R. E., Lively, M. O., Bron, S. & van Dijl, J. M. The chemistry and enzymology of the type 1
signal peptidases. Protein Sci. 6, 1129±1138 (1997).
2. Kuo, D. W. et al. Escherichia coli leader peptidase: production of an active form lacking a requirement
for detergent and development of peptide substrates. Arch. Biochem. Biophys. 303, 274±280 (1993).3. Tschantz, W. R. et al. Characterization of a soluble, catalytically active form of Escherichia coli leader
peptidase: requirement of detergent or phospholipid for optimal activity. Biochemistry 34, 3935±3941
(1995).
4. Allsop, A. E. et al. in Anti-Infectives, Recent Advances in Chemistry and Structure-Activity Relationships
(eds Bently, P. H. & O'Hanlon, P. J.) 61±72 (R. Soc. Chem., Cambridge, 1997).5. Black, M. T. & Bruton, G. Inhibitors of bacterial signal peptidases. Curr. Pharm. Des. 4, 133±154
(1998).
6. Date, T. Demonstration by a novel genetic technique that leader peptidase is an essential enzyme in
Escherichia coli. J. Bacteriol. 154, 76±83 (1983).
7. Whitely, P. & von Heijne, G. The DsbA-DsbB system affects the formation of disul®de bonds inperiplasmic but not in intramembraneous protein domains. FEBS Lett. 332, 49±51 (1993).
8. Peat, T. S. et al. Structure of the UmuD9 protein and its regulation in response to DNA damage. Nature
380, 727±730 (1996).
9. Paetzel, M. et al. Crystallization of a soluble, catalytically active form of Escherichia coli leader
peptidase. Proteins Struct. Funct. Genet. 23, 122±125 (1995).10. van Klompenburg, W. et al. Phosphatidylethanolamine mediated insertion of the catalytic domain of
leader peptidase in membranes. FEBS Lett. 431, 75±79 (1998).
11. Kim, Y. T., Muramatsu, T. & Takahashi, K. Identi®cation of Trp 300 as an important residue for
Escherichia coli leader peptidase activity. Eur. J. Biochem. 234, 358±362 (1995).
12. Landolt-Marticorena, C., Williams, K. A., Deber, C. M. & Reithmeirer, R. A. Non-random distribu-tion of amino acids in the transmembrane segments of human type I single span membrane proteins.
J. Mol. Biol. 229, 602±608 (1993).
13. James, M. N. G. in Proteolysis and Protein Turnover (eds Bond, J. S. & Barrett, A. J.) 1±8 (Portland,
Brook®eld, VT, 1994).
14. Strynadka, N. C. J. et al. Molecular structure of the acyl-enzyme intermediate in b-lactamase at 1.7 AÊ
resolution. Nature 359, 393±400 (1992).
15. Manard, R. & Storer, A. C. Oxyanion hole interactions in serine and cysteine proteases. Biol. Chem.
Hoppe-Seyler 373, 393±400 (1992).
16. Nicolas, A. et al. Contribution of cutinase Ser 42 side chain to the stabilization of the oxyanion
transition state. Biochemistry 35, 398±410 (1996).17. Paetzel, M. et al. Use of site-directed chemical modi®cation to study an essential lysine in Escherichia
coli leader peptidase. J. Biol. Chem. 272, 9994±10003 (1997).
18. Paetzel, M. & Dalbey, R. E. Catalytic hydroxyl/amine dyads with serine proteases. Trends Biochem. Sci.
22, 28±31 (1997).19. von Heijne, G. Signal sequences. The limits of variation. J. Mol. Biol. 184, 99±105 (1985).
20. Izard, J. W. & Kendall, D. A. Signal peptides: exquisitely designed transport promoters. Mol. Microbiol.
13, 765±773 (1994).
21. Matthews, B. W. Solvent content of protein crystals. J. Mol. Biol. 33, 491±497 (1968).
22. Otwinowski, Z. in DENZO (eds Sawyer, L., Isaacs, N. & Baily, S.) 56±62 (SERC Daresbury Laboratory,Warrington, UK, 1993).
23. Collaborative Computational Project No. 4 The CCP4 suite: programs for protein crystallography.
Acta Crystallogr. D 50, 760±763 (1994).
24. Jones, T. A., Zou, J. Y., Cowan, S. W. & Kieldgaard, M. Improved methods for building protein models
in electron density maps and the location of errors in these models. Acta Crystallogr. A 47, 110±119 (1991).25. Brunger, A. T. X-PLOR: A System for X-ray Crystallography and NMR (Version 3.1) (Yale Univ. Press,
New Haven, 1987).
26. Tronrud, D. E. Conjugate-direction minimization: an improved method for the re®nement of
macromolecules. Acta Crystallogr. A 48, 912±916 (1992).27. Wolfe, P. B., Wickner, W. & Goodman, J. M. Sequence of the leader peptidase gene of Escherichia coli
and the orientation of leader peptidase in the bacterial envelope. J. Biol. Chem. 258, 12073±12080
(1983).
28. Kraulis, P. G. Molscript: a program to produce both detailed and schematic plots of protein structures.
J. Appl. Crystallogr. 24, 946±950 (1991).29. Nicholls, A., Sharp, K. A. & Honig, B. Protein folding and association: insights from the interfacial and
the thermodynamic properties of hydrocarbons. Proteins Struct. Funct. Genet. 11, 281±296 (1991).
30. Meritt, E. A. & Bacon, D. J. Raster3D: photorealistic molecular graphics. Methods Enzymol. 277, 505±
524 (1997).
Acknowledgements. We thank SmithKlineBeecham Pharmaceuticals for penem inhibitor; R. M. Sweetfor use of beamline X12C (NSLS, Brookhaven National Laboratory); G. Petsko for the ethylmercuryphosphate; M. N. G. James for access to equipment for characterization of earlier crystal forms of SPase;and S. Mosimann and S. Ness for discussions. This work was supported by the Medical Research Councilof Canada, the Canadian Bacterial Diseases Network of Excellence, and British Columbia MedicalResearch Foundation grants to N.C.J.S. M.P. is funded by an MRC of Canada post-doctoral fellowship,N.C.J.S. by an MRC of Canada scholarship, and R.E.D. by the NIH and the American Heart Association.
Correspondence and requests for materials should be addressed to N.C.J.S. (e-mail: [email protected]).
errata
Reconciling thespectrumofSagittariusA*withatwo-temperatureplasmamodelRohan Mahadevan
Nature 394, 651±653 (1998)..................................................................................................................................A misleading typographical error was introduced into the secondsentence of the bold introductory paragraph of this Letter: the word` infrared'' should be ` inferred''. M
Deciphering thebiologyof Mycobacterium tuberculosis fromthecompletegenomesequence
S. T. Cole, R. Brosch, J. Parkhill, T. Garnier, C. Churcher, D. Harris, S. V. Gordon, K. Eiglmeier, S. Gas, C. E. Barry III,F. Tekaia, K. Badcock, D. Basham, D. Brown, T. Chillingworth, R. Connor, R. Davies, K. Devlin, T. Feltwell, S. Gentles,N. Hamlin, S. Holroyd, T. Hornsby, K. Jagels, A. Krogh, J. McLean, S. Moule, L. Murphy, K. Oliver, J. Osborne, M. A. Quail,M.-A. Rajandream, J. Rogers, S. Rutter, K. Seeger, J. Skelton, R. Squares, S. Squares, J. E. Sulston, K. Taylor, S. Whitehead& B. G. Barrell
Nature 393, 537±544 (1998)..........................................................................................................................................................................................................................................................................As a result of an error during ®lm output, Table 1 was published with some symbols missing. The correct version can be found athttp://www.sanger.ac.uk and is reproduced again here (following pages).
Also, in Fig. 2, we incorrectly labelled Rv0649 as fadD37 instead of fabD2. Two of the genes for mycolyl transferases were inverted:Rv0129c encodes antigen 85C and not 85C9 as stated, whereas Rv3803c codes for the secreted protein MPT51 and not antigen 85C (Infect.Immun. 59, 372±382; 1991); Rv3803c is now designated fbpD. We thank Morten Harboe and Harald Wiker for drawing this to ourattention.
The sequence of Rv0746 from M. bovis BCG-Pasteur presented in Fig. 5b was incorrect and should have shown a 16-codon deletioninstead of 29, as indicated here:H37Rv.....GSGAPGGAGGAAGLWGTGGAGGAGGSSAGGGGAGGAGGAGGWLLGDGGAGGIGGAST...
..........:::::::::::::::::::: :::::::::::::::::::::
BCG.......GSGAPGGAGGAAGLWGTGGA----------------GGAGGWLLGDGGAGGIGGAST... M
Nature © Macmillan Publishers Ltd 1998
8
letters to nature
NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com 191
Nature © Macmillan Publishers Ltd 1998
8
letters to nature
192 NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com
Nature © Macmillan Publishers Ltd 1998
8
letters to nature
NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com 193
Nature © Macmillan Publishers Ltd 1998
8
letters to nature
194 NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com
Nature © Macmillan Publishers Ltd 1998
8
letters to nature
NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com 195
Nature © Macmillan Publishers Ltd 1998
8
letters to nature
196 NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com
Nature © Macmillan Publishers Ltd 1998
8
letters to nature
NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com 197
Nature © Macmillan Publishers Ltd 1998
8
letters to nature
198 NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com
Nature © Macmillan Publishers Ltd 1998
8
NATURE | VOL 393 | 11 JUNE 1998 537
article
Deciphering the biology ofMycobacterium tuberculosis fromthe complete genome sequenceS. T. Cole*, R. Brosch*, J. Parkhill, T. Garnier*, C. Churcher, D. Harris, S. V. Gordon*, K. Eiglmeier*, S. Gas*,C. E. Barry III†, F. Tekaia‡, K. Badcock, D. Basham, D. Brown, T. Chillingworth, R. Connor, R. Davies,K. Devlin, T. Feltwell, S. Gentles, N. Hamlin, S. Holroyd, T. Hornsby, K. Jagels, A. Krogh§, J. McLean,S. Moule, L. Murphy, K. Oliver, J. Osborne, M. A. Quail, M.-A. Rajandream, J. Rogers, S. Rutter,K. Seeger, J. Skelton, R. Squares, S. Squares, J. E. Sulston, K. Taylor, S. Whitehead & B. G. Barrell
Sanger Centre, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK* Unite de Genetique Moleculaire Bacterienne, and ‡ Unite de Genetique Moleculaire des Levures, Institut Pasteur, 28 rue du Docteur Roux, 75724 Paris Cedex 15, France† Tuberculosis Research Unit, Laboratory of Intracellular Parasites, Rocky Mountain Laboratories, National Institute of Allergy and Infectious Diseases, NationalInstitutes of Health, Hamilton, Montana 59840, USA§ Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Countlessmillions of people have died from tuberculosis, a chronic infectious disease caused by the tubercle bacillus.The complete genome sequence of the best-characterized strain of Mycobacterium tuberculosis, H37Rv, has beendeterminedandanalysed inorder to improveourunderstandingof thebiologyof thisslow-growingpathogenand tohelpthe conception of new prophylactic and therapeutic interventions. The genome comprises 4,411,529 base pairs,contains around4,000genes, andhasavery high guanine+ cytosine content that is reflected in the biasedamino-acidcontent of the proteins. M. tuberculosis differs radically from other bacteria in that a very large portion of its codingcapacity is devoted to the production of enzymes involved in lipogenesis and lipolysis, and to two new families ofglycine-rich proteins with a repetitive structure that may represent a source of antigenic variation.
Despite the availability of effective short-course chemotherapy(DOTS) and the Bacille Calmette-Guerin (BCG) vaccine, thetubercle bacillus continues to claim more lives than any othersingle infectious agent1. Recent years have seen increased incidenceof tuberculosis in both developing and industrialized countries, thewidespread emergence of drug-resistant strains and a deadlysynergy with the human immunodeficiency virus (HIV). In 1993,the gravity of the situation led the World Health Organisation (WHO)to declare tuberculosis a global emergency in an attempt to heightenpublic and political awareness. Radical measures are needed now toprevent the grim predictions of the WHO becoming reality. Thecombination of genomics and bioinformatics has the potential togenerate the information and knowledge that will enable theconception and development of new therapies and interventionsneeded to treat this airborne disease and to elucidate the unusualbiology of its aetiological agent, Mycobacterium tuberculosis.
The characteristic features of the tubercle bacillus include its slowgrowth, dormancy, complex cell envelope, intracellular pathogen-esis and genetic homogeneity2. The generation time of M. tubercu-losis, in synthetic medium or infected animals, is typically ,24hours. This contributes to the chronic nature of the disease, imposeslengthy treatment regimens and represents a formidable obstacle forresearchers. The state of dormancy in which the bacillus remainsquiescent within infected tissue may reflect metabolic shutdownresulting from the action of a cell-mediated immune response thatcan contain but not eradicate the infection. As immunity wanes,through ageing or immune suppression, the dormant bacteriareactivate, causing an outbreak of disease often many decadesafter the initial infection3. The molecular basis of dormancy andreactivation remains obscure but is expected to be geneticallyprogrammed and to involve intracellular signalling pathways.
The cell envelope of M. tuberculosis, a Gram-positive bacteriumwith a G + C-rich genome, contains an additional layer beyond thepeptidoglycan that is exceptionally rich in unusual lipids, glycoli-
pids and polysaccharides4,5. Novel biosynthetic pathways generatecell-wall components such as mycolic acids, mycocerosic acid,phenolthiocerol, lipoarabinomannan and arabinogalactan, andseveral of these may contribute to mycobacterial longevity, triggerinflammatory host reactions and act in pathogenesis. Little isknown about the mechanisms involved in life within the macro-phage, or the extent and nature of the virulence factors produced bythe bacillus and their contribution to disease.
It is thought that the progenitor of the M. tuberculosis complex,comprising M. tuberculosis, M. bovis, M. bovis BCG, M. africanumand M. microti, arose from a soil bacterium and that the humanbacillus may have been derived from the bovine form following thedomestication of cattle. The complex lacks interstrain geneticdiversity, and nucleotide changes are very rare6. This is importantin terms of immunity and vaccine development as most of theproteins will be identical in all strains and therefore antigenic driftwill be restricted. On the basis of the systematic sequence analysis of26 loci in a large number of independent isolates6, it was concludedthat the genome of M. tuberculosis is either unusually inert or thatthe organism is relatively young in evolutionary terms.
Since its isolation in 1905, the H37Rv strain of M. tuberculosis hasfound extensive, worldwide application in biomedical researchbecause it has retained full virulence in animal models of tubercu-losis, unlike some clinical isolates; it is also susceptible to drugs andamenable to genetic manipulation. An integrated map of the 4.4megabase (Mb) circular chromosome of this slow-growing patho-gen had been established previously and ordered libraries ofcosmids and bacterial artificial chromosomes (BACs) wereavailable7,8.
Organization and sequence of the genomeSequence analysis. To obtain the contiguous genome sequence, acombined approach was used that involved the systematic sequenceanalysis of selected large-insert clones (cosmids and BACs) as well as
Nature © Macmillan Publishers Ltd 1998
8
random small-insert clones from a whole-genome shotgun library.This culminated in a composite sequence of 4,411,529 base pairs(bp) (Figs 1, 2), with a G + C content of 65.6%. This represents thesecond-largest bacterial genome sequence currently available (afterthat of Escherichia coli)9. The initiation codon for the dnaA gene, ahallmark for the origin of replication, oriC, was chosen as the startpoint for numbering. The genome is rich in repetitive DNA,particularly insertion sequences, and in new multigene familiesand duplicated housekeeping genes. The G + C content is relativelyconstant throughout the genome (Fig. 1) indicating that horizon-tally transferred pathogenicity islands of atypical base compositionare probably absent. Several regions showing higher than average G+ C content (Fig. 1) were detected; these correspond to sequencesbelonging to a large gene family that includes the polymorphic G +C-rich sequences (PGRSs).Genes for stable RNA. Fifty genes coding for functional RNAmolecules were found. These molecules were the three speciesproduced by the unique ribosomal RNA operon, the 10Sa RNAinvolved in degradation of proteins encoded by abnormal messen-ger RNA, the RNA component of RNase P, and 45 transfer RNAs.No 4.5S RNA could be detected. The rrn operon is situatedunusually as it occurs about 1,500 kilobases (kb) from the putativeoriC; most eubacteria have one or more rrn operons near to oriC toexploit the gene-dosage effect obtained during replication10. Thisarrangement may be related to the slow growth of M. tuberculosis.The genes encoding tRNAs that recognize 43 of the 61 possible sensecodons were distributed throughout the genome and, with one
exception, none of these uses A in the first position of the anticodon,indicating that extensive wobble occurs during translation. This isconsistent with the high G + C content of the genome and theconsequent bias in codon usage. Three genes encoding tRNAs formethionine were found; one of these genes (metV) is situated in aregion that may correspond to the terminus of replication (Figs 1,2). As metV is linked to defective genes for integrase and excisionase,perhaps it was once part of a phage or similar mobile geneticelement.Insertion sequences and prophages. Sixteen copies of the promis-cuous insertion sequence IS6110 and six copies of the more stableelement IS1081 reside within the genome of H37Rv8. One copy ofIS1081 is truncated. Scrutiny of the genomic sequence led to theidentification of a further 32 different insertion sequence elements,most of which have not been described previously, and of the 13E12family of repetitive sequences which exhibit some of the character-istics of mobile genetic elements (Fig. 1). The newly discoveredinsertion sequences belong mainly to the IS3 and IS256 families,although six of them define a new group. There is extensivesimilarity between IS1561 and IS1552 with insertion sequenceelements found in Nocardia and Rhodococcus spp., suggesting thatthey may be widely disseminated among the actinomycetes.
Most of the insertion sequences in M. tuberculosis H37Rv appearto have inserted in intergenic or non-coding regions, often neartRNA genes (Fig. 1). Many are clustered, suggesting the existence ofinsertional hot-spots that prevent genes from being inactivated, ashas been described for Rhizobium11. The chromosomal distributionof the insertion sequences is informative as there appears to havebeen a selection against insertions in the quadrant encompassingoriC and an overrepresentation in the direct repeat region thatcontains the prototype IS6110. This bias was also observed experi-mentally in a transposon mutagenesis study12.
At least two prophages have been detected in the genomesequence and their presence may explain why M. tuberculosisshows persistent low-level lysis in culture. Prophages phiRv1 andphiRv2 are both ,10 kb in length and are similarly organized, andsome of their gene products show marked similarity to thoseencoded by certain bacteriophages from Streptomyces and sapro-phytic mycobacteria. The site of insertion of phiRv1 is intriguing asit corresponds to part of a repetitive sequence of the 13E12 familythat itself appears to have integrated into the biotin operon. Somestrains of M. tuberculosis have been described as requiring biotin as agrowth supplement, indicating either that phiRv1 has a polar effecton expression of the distal bio genes or that aberrant excision,leading to mutation, may occur. During the serial attenuation of M.bovis that led to the vaccine strain M. bovis BCG, the phiRv1prophage was lost13. In a systematic study of the genomic diversityof prophages and insertion sequences (S.V.G. et al., manuscript inpreparation), only IS1532 exhibited significant variability, indicat-ing that most of the prophages and insertion sequences are currentlystable. However, from these combined observations, one can con-clude that horizontal transfer of genetic material into the free-livingancestor of the M. tuberculosis complex probably occurred in naturebefore the tubercle bacillus adopted its specialized intracellularniche.
article
538 NATURE | VOL 393 | 11 JUNE 1998
4,411,529 bp
H37Rv
0
4
1
2
M. tuberculosis
3
Figure 1 Circular map of the chromosome of M. tuberculosis H37Rv. The outer
circle shows the scale in Mb, with 0 representing the origin of replication. The first
ring from the exterior denotes the positions of stable RNA genes (tRNAs are blue,
others are pink) and the direct repeat region (pink cube); the second ring inwards
shows the coding sequence bystrand (clockwise, darkgreen; anticlockwise, light
green); the third ring depicts repetitive DNA (insertion sequences, orange; 13E12
REP family, dark pink; prophage, blue); the fourth ring shows the positions of the
PPE family members (green); the fifth ring shows the PE family members (purple,
excluding PGRS); and the sixth ring shows the positions of the PGRS sequences
(dark red). The histogram (centre) represents G + C content, with ,65% G + C in
yellow, and .65% G + C in red. The figure was generated with software from
DNASTAR.
Figure 2 Linear map of the chromosome of M. tuberculosis H37Rv showing the
position and orientation of known genes and coding sequences (CDS). We used
the following functional categories (adapted from ref. 20): lipid metabolism
(black); intermediary metabolism and respiration (yellow); information pathways
(pink); regulatory proteins (sky blue); conserved hypothetical proteins (orange);
proteins of unknown function (light green); insertion sequences and phage-
related functions (blue); stable RNAs (purple); cell wall and cell processes (dark
green); PE and PPE protein families (magenta); virulence, detoxification and
adaptation (white). For additional information about gene functions, refer to
http://www.sanger.ac.uk.
Q
Nature © Macmillan Publishers Ltd 1998
8
Genes encoding proteins. 3,924 open reading frames were identi-fied in the genome (see Methods), accounting for ,91% of thepotential coding capacity (Figs 1, 2). A few of these genes appear tohave in-frame stop codons or frameshift mutations (irrespective ofthe source of the DNA sequenced) and may either use frameshiftingduring translation or correspond to pseudogenes. Consistent withthe high G + C content of the genome, GTG initiation codons (35%)are used more frequently than in Bacillus subtilis (9%) and E. coli(14%), although ATG (61%) is the most common translationalstart. There are a few examples of atypical initiation codons, themost notable being the ATC used by infC, which begins with ATT inboth B. subtilis and E. coli9,14. There is a slight bias in the orientationof the genes (Fig. 1) with respect to the direction of replication as,59% are transcribed with the same polarity as replication,compared with 75% in B. subtilis. In other bacteria, genes tran-scribed in the same direction as the replication forks are believed tobe expressed more efficiently9,14. Again, the more even distributionin gene polarity seen in M. tuberculosis may reflect the slow growthand infrequent replication cycles. Three genes (dnaB, recA andRv1461) have been invaded by sequences encoding inteins (proteinintrons) and in all three cases their counterparts in M. leprae alsocontain inteins, but at different sites15 (S.T.C. et al., unpublishedobservations).Protein function, composition and duplication. By using variousdatabase comparisons, we attributed precise functions to ,40% ofthe predicted proteins and found some information or similarity foranother 44%. The remaining 16% resembled no known proteinsand may account for specific mycobacterial functions. Examinationof the amino-acid composition of the M. tuberculosis proteome bycorrespondence analysis16, and comparison with that of othermicroorganisms whose genome sequences are available, revealed astatistically significant preference for the amino acids Ala, Gly, Pro,Arg and Trp, which are all encoded by G + C-rich codons, and acomparative reduction in the use of amino acids encoded by A + T-rich codons such as Asn, Ile, Lys, Phe and Tyr (Fig. 3). This approachalso identified two groups of proteins rich in Asn or Gly that belongto new families, PE and PPE (see below). The fraction of theproteome that has arisen through gene duplication is similar tothat seen in E. coli or B. subtilis (,51%; refs 9, 14), except that thelevel of sequence conservation is considerably higher, indicatingthat there may be extensive redundancy or differential productionof the corresponding polypeptides. The apparent lack of divergencefollowing gene duplication is consistent with the hypothesis thatM. tuberculosis is of recent descent6.
General metabolism, regulation and drug resistanceMetabolic pathways. From the genome sequence, it is clear that thetubercle bacillus has the potential to synthesize all the essentialamino acids, vitamins and enzyme co-factors, although some of thepathways involved may differ from those found in other bacteria. M.tuberculosis can metabolize a variety of carbohydrates, hydrocar-bons, alcohols, ketones and carboxylic acids2,17. It is apparent fromgenome inspection that, in addition to many functions involved inlipid metabolism, the enzymes necessary for glycolysis, the pentosephosphate pathway, and the tricarboxylic acid and glyoxylate cyclesare all present. A large number (,200) of oxidoreductases, oxyge-nases and dehydrogenases is predicted, as well as many oxygenasescontaining cytochrome P450, that are similar to fungal proteinsinvolved in sterol degradation. Under aerobic growth conditions,ATP will be generated by oxidative phosphorylation from electrontransport chains involving a ubiquinone cytochrome b reductasecomplex and cytochrome c oxidase. Components of several anae-robic phosphorylative electron transport chains are also present,including genes for nitrate reductase (narGHJI), fumarate reductase(frdABCD) and possibly nitrite reductase (nirBD), as well as a newreductase (narX) that results from a rearrangement of a homologueof the narGHJI operon. Two genes encoding haemoglobin-like
proteins, which may protect against oxidative stress or be involvedin oxygen capture, were found. The ability of the bacillus to adapt itsmetabolism to environmental change is significant as it not only hasto compete with the lung for oxygen but must also adapt to themicroaerophilic/anaerobic environment at the heart of theburgeoning granuloma.Regulation and signal transduction. Given the complexity of theenvironmental and metabolic choices facing M. tuberculosis, anextensive regulatory repertoire was expected. Thirteen putativesigma factors govern gene expression at the level of transcriptioninitiation, and more than 100 regulatory proteins are predicted(Table 1). Unlike B. subtilis and E. coli, in which there are .30 copiesof different two-component regulatory systems14, M. tuberculosishas only 11 complete pairs of sensor histidine kinases and responseregulators, and a few isolated kinase and regulatory genes. Thisrelative paucity in environmental signal transduction pathways isprobably offset by the presence of a family of eukaryotic-like serine/threonine protein kinases (STPKs), which function as part of aphosphorelay system18. The STPKs probably have two domains: thewell-conserved kinase domain at the amino terminus is predicted tobe connected by a transmembrane segment to the carboxy-terminalregion that may respond to specific stimuli. Several of the predictedenvelope lipoproteins, such as that encoded by lppR (Rv2403), showextensive similarity to this putative receptor domain of STPKs,suggesting possible interplay. The STPKs probably function insignal transduction pathways and may govern important cellulardecisions such as dormancy and cell division, and although theirpartners are unknown, candidate genes for phosphoprotein phos-phatases have been identified.Drug resistance. M. tuberculosis is naturally resistant to manyantibiotics, making treatment difficult19. This resistance is duemainly to the highly hydrophobic cell envelope acting as a perme-ability barrier4, but many potential resistance determinants are alsoencoded in the genome. These include hydrolytic or drug-modify-ing enzymes such as b-lactamases and aminoglycoside acetyltransferases, and many potential drug–efflux systems, such as 14members of the major facilitator family and numerous ABCtransporters. Knowledge of these putative resistance mechanismswill promote better use of existing drugs and facilitate the concep-tion of new therapies.
article
NATURE | VOL 393 | 11 JUNE 1998 539
F2 -22.6%
Mj Ae AfMth
Bb
Hp
MgMp
Sc Ce
HiSs
Ec
Bs
Glu
TyrIleLys
Asn
Phe
Ser Thr
His
Gln
Trp
Ala
Arg
GlyVal
MetCys
AspLeu
Pro0
0.1
-0.1
-0.2
-0.3
-0.30 -0.15 0 0.15 0.30
Mt
F1 - 55.2%
Figure 3 Correspondence analysis of the proteomes from extensively
sequenced organisms as a function of amino-acid composition. Note the
extreme position of M. tuberculosis and the shift in amino-acid preference
reflecting increasing G + C content from left to right. Abbreviations used: Ae,
Aquifex aeolicus; Af, Archaeoglobus fulgidis; Bb, Borrelia burgdorfei; Bs, B.
subtilis; Ce, Caenorhabditis elegans; Ec, E. coli; Hi, Haemophilus influenzae; Hp,
Helicobacter pylori; Mg, Mycoplasma genitalium; Mj, Methanococcus jannaschi;
Mp, Mycoplasma pneumoniae; Mt, M. tuberculosis; Mth, Methanobacterium
thermoautotrophicum; Sc, Saccharomyces cerevisiae; Ss, Synechocystis sp.
strain PCC6803. F1 and F2, first and second factorial axes16.
Nature © Macmillan Publishers Ltd 1998
8
article
540 NATURE | VOL 393 | 11 JUNE 1998
8 (kb)
fas
fabD acpM kasA kasB accD
mabA inhA
cmaA1
cmaA2
1 2 3 4 5 6 7
b
Gene Function
fas Fatty acid synthase, produces C16 -C 26 acyl-CoA esters
fabD Malonyl-CoA:AcpM acyltransferase, AcpM loadingacpM Acyl carrier protein, meromycolate precursor transportkasA Ketoacyl acyl carrier protein synthase, chain elongationkasB Ketoacyl acyl carrier protein synthase, chain elongationaccD Acetyl-CoA carboxylase, malonyl-CoA synthesismabA 3-oxo-acyl-acyl carrier protein reductase, reduces KasA/B productinhA Enoyl-acyl carrier protein reductase
cmaA1 cyclopropane mycolic acid synthase 1, distal-position specificcmaA2 cyclopropane mycolic acid synthase 2, proximal-position specific
O
HOH
HOH
0
R
O
NADH, FADH 2
Amino acids
Purines,Pyrimidines
Porphyrins
Glucose
ATP
R
O
R
O
SCoA
R
O
R
OO
SCoAβ-Oxidation
(fadA/fadB)
fadD1-36
fadE1-36
fadB2-5echA1-21
fadA2-6
Citricacidcycle
Glyoxylateshunt
O
SCoA
O
SCo A
CO2
accA1/accD1accA2/accD2
Cell wall and mycobacterial lipids
Host membranes
SCoA
SCoA
OH
a
OH
masppsA ppsB ppsC ppsD ppsE
10 20 30 40 (kb)0
CH3
CH3
OCH 3CH3CH3CH3
CH3
CH 3CH3CH3
CH3
OOOO
mas Four rounds extension of C 16,18 using Me-malonyl CoA
ppsA Extension of C 18 with malony CoA, partial reduction
ppsB Extension with malony CoA, partial reduction
Extension with malony CoA, complete reductionppsC
ppsD Extension with Me-malony CoA, partial reduction
ppsE Extension with malony CoA, partial reduction, decarboxylation
Gene Function
c
Figure 4 Lipid metabolism. a, Degradation of
host-cell lipids is vital in the intracellular life of
M. tuberculosis. Host-cell membranes provide
precursors for many metabolic processes,
as well as potential precursors of mycobac-
terial cell-wall constituents, through the
actions of a broad family of b-oxidative
enzymes encoded by multiple copies in the
genome. These enzymes produce acetyl
CoA, which can be converted into many
different metabolites and fuel for the bacteria
through the actions of the enzymes of the
citric acid cycle and the glyoxylate shunt of
this cycle. b, The genes that synthesize
mycolic acids, the dominant lipid component
of the mycobacterial cell wall, include the
type I fatty acid synthase (fas) and a unique
type II system which relies on extension of a
precursor bound to an acyl carrier protein to
form full-length (,80-carbon) mycolic acids.
The cma genes are responsible for
cyclopropanation. c, The genes that produce
phthiocerol dimycocerosate form a large
operon and represent type I (mas) and type
II (the pps operon) polyketide synthase
systems. Functions are colour coordinated.
Nature © Macmillan Publishers Ltd 1998
8
Lipid metabolismVery few organisms produce such a diverse array of lipophilicmolecules as M. tuberculosis. These molecules range from simplefatty acids such as palmitate and tuberculostearate, through iso-prenoids, to very-long-chain, highly complex molecules such asmycolic acids and the phenolphthiocerol alcohols that esterify withmycocerosic acid to form the scaffold for attachment of the myco-sides. Mycobacteria contain examples of every known lipid andpolyketide biosynthetic system, including enzymes usually found inmammals and plants as well as the common bacterial systems. Thebiosynthetic capacity is overshadowed by the even more remarkableradiation of degradative, fatty acid oxidation systems and, in total,there are ,250 distinct enzymes involved in fatty acid metabolismin M. tuberculosis compared with only 50 in E. coli20.Fatty acid degradation. In vivo-grown mycobacteria have beensuggested to be largely lipolytic, rather than lipogenic, because ofthe variety and quantity of lipids available within mammalian cellsand the tubercle2 (Fig. 4a). The abundance of genes encodingcomponents of fatty acid oxidation systems found by our genomicapproach supports this proposition, as there are 36 acyl-CoAsynthases and a family of 36 related enzymes that could catalysethe first step in fatty acid degradation. There are 21 homologousenzymes belonging to the enoyl-CoA hydratase/isomerase super-family of enzymes, which rehydrate the nascent product of the acyl-CoA dehydrogenase. The four enzymes that convert the 3-hydroxyfatty acid into a 3-keto fatty acid appear less numerous, mainly
because they are difficult to distinguish from other members of theshort-chain alcohol dehydrogenase family on the basis of primarysequence. The five enzymes that complete the cycle by thiolysis ofthe b-ketoester, the acetyl-CoA C-acetyltransferases, do indeedappear to be a more limited family. In addition to this extensiveset of dissociated degradative enzymes, the genome also encodes thecanonical FadA/FadB b-oxidation complex (Rv0859 and Rv0860).Accessory activities are present for the metabolism of odd-chain andmultiply unsaturated fatty acids.Fatty acid biosynthesis. At least two discrete types of enzymesystem, fatty acid synthase (FAS) I and FAS II, are involved infatty acid biosynthesis in mycobacteria (Fig. 4b). FAS I (Rv2524, fas)is a single polypeptide with multiple catalytic activities that gen-erates several shorter CoA esters from acetyl-CoA primers5 andprobably creates precursors for elongation by all of the other fattyacid and polyketide systems. FAS II consists of dissociable enzymecomponents which act on a substrate bound to an acyl-carrierprotein (ACP). FAS II is incapable of de novo fatty acid synthesis butinstead elongates palmitoyl-ACP to fatty acids ranging from 24 to56 carbons in length17,21. Several different components of FAS II maybe targets for the important tuberculosis drug isoniazid, includingthe enoyl-ACP reductase InhA22, the ketoacyl-ACP synthase KasAand the ACP AcpM21. Analysis of the genome shows that there areonly three potential ketoacyl synthases: KasA and KasB are highlyrelated, and their genes cluster with acpM, whereas KasC is a moredistant homologue of a ketoacyl synthase III system. The number ofketoacyl synthase and ACP genes indicates that there is a single FASII system. Its genetic organization, with two clustered ketoacylsynthases, resembles that of type II aromatic polyketide biosyntheticgene clusters, such as those for actinorhodin, tetracycline andtetracenomycin in Streptomyces species23. InhA seems to be the soleenoyl-ACP reductase and its gene is co-transcribed with a fabGhomologue, which encodes 3-oxoacyl-ACP reductase. Both of theseproteins are probably important in the biosynthesis of mycolic acids.
Fatty acids are synthesized from malonyl-CoA and precursors aregenerated by the enzymatic carboxylation of acetyl (or propionyl)-CoA by a biotin-dependent carboxylase (Fig. 4b). From study of thegenome we predict that there are three complete carboxylasesystems, each consisting of an a- and a b-subunit, as well as threeb-subunits without an a-counterpart. As a group, all of thecarboxylases seem to be more related to the mammalian homo-logues than to the corresponding bacterial enzymes. Two of thesecarboxylase systems (accA1, accD1 and accA2, accD2) are probablyinvolved in degradation of odd-numbered fatty acids, as they areadjacent to genes for other known degradative enzymes. They mayconvert propionyl-CoA to succinyl-CoA, which can then be incor-porated into the tricarboxylic acid cycle. The synthetic carboxylases(accA3, accD3, accD4, accD5 and accD6) are more difficult tounderstand. The three extra b-subunits might direct carboxylationto the appropriate precursor or may simply increase the totalamount of carboxylated precursor available if this step were rate-limiting.
Synthesis of the paraffinic backbone of fatty and mycolic acids inthe cell is followed by extensive postsynthetic modifications andunsaturations, particularly in the case of the mycolic acids24,25.Unsaturation is catalysed either by a FabA-like b-hydroxyacyl-ACP dehydrase, acting with a specific ketoacyl synthase, or by anaerobic terminal mixed function desaturase that uses both mole-cular oxygen and NADPH. Inspection of the genome revealed noobvious candidates for the FabA-like activity. However, threepotential aerobic desaturases (encoded by desA1, desA2 anddesA3) were evident that show little similarity to related vertebrateor yeast enzymes (which act on CoA esters) but instead resembleplant desaturases (which use ACP esters). Consequently, the geno-mic data indicate that unsaturation of the meromycolate chain mayoccur while the acyl group is bound to AcpM.
Much of the subsequent structural diversity in mycolic acids is
article
NATURE | VOL 393 | 11 JUNE 1998 541
H37Rv .. ...GSGAPGGAGGAAGLWGTGGAGGAGGSSAGGGGAGGAGGAGGWLLGDGGAGGIGGAST.......... ::::::::::::::::: ............................. :::::::::::BCG .... ...GSGAPGGAGGAAGLWGT-----------------------------GGAGGIGGAST
b
46 aa
29 aa
PE-PGRS sequence variation in ORF Rv0746
PGRS
PGRS
H37Rv PE
PE
BCG
H37Rv .. VLGGTGGGGGVGGLWGAGGAGGAGGTGLVGGDGGAGGAGGTGGLLAGLIGAGGGHGGTGG....... ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::BCG .... VLGGTGGGGGVGGLWGAGGAGGAGGTGLVGGDGGAGGAGGTGGLLAGLIGAGGGHGGTGG
H37Rv .. LSTNGDGGVGGAGGNAGMLAGPGGAGGAGGDGENLDTGGDGGAGGSAGLLFGSGGAGGAG....... ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::BCG .... LSTNGDGGVGGAGGNAGMLAGPGGAGGAGGDGENLDTGGDGGAGGSAGLLFGSGGAGGAG
H37Rv .. GFGFLGGDGGAGGNAGLLLSSGGAGGFGGFGTAGGVGGAGGNAGWLGF------------....... ::::::::::::::::::::::::::::::::::::::::::::::::BCG .... GFGFLGGDGGAGGNAGLLLSSGGAGGFGGFGTAGGVGGAGGNAGWLGFGGAGGIGGIGGN
H37Rv .. ----------------------------------GGAGGVGGSAGLIGTGGNGGNGGTGA.......:::::::::::::::::::::::::::::::::: ::::::::::::::::::::::::::BCG .... ANGGAGGNGGTGGQLWGSGGAGGEGGAALSVGDTGGAGGVGGSAGLIGTGGNGGNGGTGA
H37Rv .. NAGSPGTGGAGGLLLGQNGLNGLP*....... ::::::::::::::::::::::::BCG .... NAGSPGTGGAGGLLLGQNGLNGLP*
a
Representatives of the PE family
PGRS PE 0 .. >1,400 aa
0 .. ~500 aa~110 aa
~110 aa
unique sequence
(GGAGGA)n
Representatives of the PPE family
~200 .. ~400 aa
~200 .. >3,500 aa
0 .. ~400 aa
(NxGxGNxG)n MPTR
GxxSVPxxW
PPE
PPE
PPE
PE
~180 aa
~180 aa
~180 aaunique sequence
Figure 5 The PE and PPE protein families. a, Classification of the PE and PPE
protein families. b, Sequence variation between M. tuberculosis H37Rv and M.
bovis BCG-Pasteur in the PE-PGRS encoded by open reading frame (ORF)
Rv0746.
Nature © Macmillan Publishers Ltd 1998
8
generated by a family of S-adenosyl-L-methionine-dependentenzymes, which use the unsaturated meromycolic acid as a substrateto generate cis and trans cyclopropanes and other mycolates. Sixmembers of this family have been identified and characterized25 andtwo clustered, convergently transcribed new genes are evident in thegenome (umaA1 and umaA2). From the functions of the knownfamily members and the structures of mycolic acids in M. tubercu-losis, it is tempting to speculate that these new enzymes mayintroduce the trans cyclopropanes into the meromycolate precursor.In addition to these two methyltransferases, there are two otherunrelated lipid methyltransferases (Ufa1 and Ufa2) that sharehomology with cyclopropane fatty acid synthase of E. coli25.Although cyclopropanation seems to be a relatively commonmodification of mycolic acids, cyclopropanation of plasma-mem-brane constituents has not been described in mycobacteria. Tuber-culostearic acid is produced by methylation of oleic acid, and maybe synthesized by one of these two enzymes.
Condensation of the fully functionalized and preformed mero-mycolate chain with a 26-carbon a-branch generates full-lengthmycolic acids that must be transported to their final location forattachment to the cell-wall arabinogalactan. The transfer andsubsequent transesterification is mediated by three well-knownimmunogenic proteins of the antigen 85 complex26. The genomeencodes a fourth member of this complex, antigen 85C9 ( fbpC2,Rv0129), which is highly related to antigen 85C. Further studies areneeded to show whether the protein possesses mycolytransferaseactivity and to clarify the reason behind the apparent redundancy.Polyketide synthesis. Mycobacteria synthesize polyketides by sev-eral different mechanisms. A modular type I system, similar to thatinvolved in erythromycin biosynthesis23, is encoded by a very largeoperon, ppsABCDE, and functions in the production ofphenolphthiocerol5. The absence of a second type I polyketidesynthase suggests that the related lipids phthiocerol A and B,phthiodiolone A and phthiotriol may all be synthesized by thesame system, either from alternative primers or by differentialpostsynthetic modification. It is physiologically significant thatthe pps gene cluster occurs immediately upstream of mas, whichencodes the multifunctional enzyme mycocerosic acid synthase(MAS), as their products phthiocerol and mycocerosic acid esterifyto form the very abundant cell-wall-associated molecule phthio-cerol dimycocerosate (Fig. 4c).
Members of another large group of polyketide synthase enzymesare similar to MAS, which also generates the multiply methyl-branched fatty acid components of mycosides and phthioceroldimycocerosate, abundant cell-wall-associated molecules5. Althoughsome of these polyketide synthases may extend type I FAS CoAprimers to produce other long-chain methyl-branched fatty acidssuch as mycolipenic, mycolipodienic and mycolipanolic acids or thephthioceranic and hydroxyphthioceranic acids, or may even showfunctional overlap5, there are many more of these enzymes thanthere are known metabolites. Thus there may be new lipid andpolyketide metabolites that are expressed only under certain con-ditions, such as during infection and disease.
A fourth class of polyketide synthases is related to the plantenzyme superfamily that includes chalcone and stilbene synthase23.These polyketide synthases are phylogenetically divergent from allother polyketide and fatty acid synthases and generate unreducedpolyketides that are typically associated with anthocyanin pigmentsand flavonoids. The function of these systems, which are oftenlinked to apparent type I modules, is unknown. An example is thegene cluster spanning pks10, pks7, pks8 and pks9, which includes twoof the chalcone-synthase-like enzymes and two modules of anapparent type I system. The unknown metabolites produced bythese enzymes are interesting because of the potent biologicalactivities of some polyketides such as the immunosuppressorrapamycin.Siderophores. Peptides that are not ribosomally synthesized are
made by a process that is mechanistically analogous to polyketidesynthesis23,27. These peptides include the structurally related iron-scavenging siderophores, the mycobactins and the exochelins2,28,which are derived from salicylate by the addition of serine (orthreonine), two lysines and various fatty acids and possible poly-ketide segments. The mbt operon, encoding one apparent salicylate-activating protein, three amino-acid ligases, and a single module ofa type I polyketide synthase, may be responsible for the biosynthesisof the mycobacterial siderophores. The presence of only one non-ribosomal peptide-synthesis system indicates that this pathway maygenerate both siderophores and that subsequent modification of asingle e-amino group of one lysine residue may account for thedifferent physical properties and function of the siderophores28.
Immunological aspects and pathogenicityGiven the scale of the global tuberculosis burden, vaccination is notonly a priority but remains the only realistic public health inter-vention that is likely to affect both the incidence and the prevalenceof the disease29. Several areas of vaccine development are promising,including DNA vaccination, use of secreted or surface-exposedproteins as immunogens, recombinant forms of BCG and rationalattenuation of M. tuberculosis29. All of these avenues of research willbenefit from the genome sequence as its availability will stimulatemore focused approaches. Genes encoding ,90 lipoproteins wereidentified, some of which are enzymes or components of transportsystems, and a similar number of genes encoding preproteins (withtype I signal peptides) that are probably exported by the Sec-dependent pathway. M. tuberculosis seems to have two copies ofsecA. The potent T-cell antigen Esat-6 (ref. 30), which is probablysecreted in a Sec-independent manner, is encoded by a member of amultigene family. Examination of the genetic context reveals severalsimilarly organized operons that include genes encoding large ATP-hydrolysing membrane proteins that might act as transporters. Oneof the surprises of the genome project was the discovery of twoextensive families of novel glycine-rich proteins, which may be ofimmunological significance as they are predicted to be abundantand potentially polymorphic antigens.The PE and PPE multigene families. About 10% of the codingcapacity of the genome is devoted to two large unrelated families ofacidic, glycine-rich proteins, the PE and PPE families, whose genesare clustered (Figs 1, 2) and are often based on multiple copies of thepolymorphic repetitive sequences referred to as PGRSs, and majorpolymorphic tandem repeats (MPTRs), respectively31,32. The namesPE and PPE derive from the motifs Pro–Glu (PE) and Pro–Pro–Glu (PPE) found near the N terminus in most cases33. The 99members of the PE protein family all have a highly conserved N-terminal domain of ,110 amino-acid residues that is predicted tohave a globular structure, followed by a C-terminal segment thatvaries in size, sequence and repeat copy number (Fig. 5). Phyloge-netic analysis separated the PE family into several subfamilies. Thelargest of these is the highly repetitive PGRS class, which contains 61members; members of the other subfamilies, share very limitedsequence similarity in their C-terminal domains (Fig. 5). Thepredicted molecular weights of the PE proteins vary considerablyas a few members contain only the N-terminal domain, whereasmost have C-terminal extensions ranging in size from 100 to 1,400residues. The PGRS proteins have a high glycine content (up to50%), which is the result of multiple tandem repetitions of Gly–Gly–Ala or Gly–Gly–Asn motifs, or variations thereof.
The 68 members of the PPE protein family (Fig. 5) also have aconserved N-terminal domain that comprises ,180 amino-acidresidues, followed by C-terminal segments that vary markedly insequence and length. These proteins fall into at least three groups,one of which constitutes the MPTR class characterized by thepresence of multiple, tandem copies of the motif Asn–X–Gly–X–Gly–Asn–X–Gly. The second subgroup contains a characteristic,well-conserved motif around position 350, whereas the third contains
article
542 NATURE | VOL 393 | 11 JUNE 1998
Nature © Macmillan Publishers Ltd 1998
8
proteins that are unrelated except for the presence of the common180-residue PPE domain.
The subcellular location of the PE and PPE proteins is unknownand in only one case, that of a lipase (Rv3097), has a function beendemonstrated. On examination of the protein database from theextensively sequenced M. leprae15, no PGRS- or MPTR-relatedpolypeptides were detected but a few proteins belonging to thenon-MPTR subgroup of the PPE family were found. These proteinsinclude one of the major antigens recognized by leprosy patients,the serine-rich antigen34. Although it is too early to attributebiological functions to the PE and PPE families, it is tempting tospeculate that they could be of immunological importance. Twointeresting possibilities spring to mind. First, they could representthe principal source of antigenic variation in what is otherwise agenetically and antigenically homogeneous bacterium. Second,these glycine-rich proteins might interfere with immune responsesby inhibiting antigen processing.
Several observations and results support the possibility of anti-genic variation associated with both the PE and the PPE familyproteins. The PGRS member Rv1759 is a fibronectin-bindingprotein of relative molecular mass 55,000 (ref. 35) that elicits avariable antibody response, indicating either that individualsmount different immune responses or that this PGRS proteinmay vary between strains of M. tuberculosis. The latter possibilityis supported by restriction fragment length polymorphisms forvarious PGRS and MPTR sequences in clinical isolates33. Directsupport for genetic variation within both the PE and the PPEfamilies was obtained by comparative DNA sequence analysis (Fig.5). The gene for the PE–PGRS protein Rv0746 of BCG differs fromthat in H37Rv by the deletion of 29 codons and the insertion of 46codons. Similar variation was seen in the gene for the PPE proteinRv0442 (data not shown). As these differences were all associatedwith repetitive sequences they could have resulted from intergenicor intragenic recombinational events or, more probably, fromstrand slippage during replication32. These mechanisms areknown to generate antigenic variability in other bacterialpathogens36.
There are several parallels between the PGRS proteins and theEpstein–Barr virus nuclear antigens (EBNAs). Members of bothpolypeptide families are glycine-rich, contain extensive Gly–Alarepeats, and exhibit variation in the length of the repeat regionbetween different isolates. The Gly–Ala repeat region of EBNA1functions as a cis-acting inhibitor of the ubiquitin/proteasomeantigen-processing pathway that generates peptides presented inthe context of major histocompatibility complex (MHC) class Imolecules37,38. MHC class I knockout mice are very susceptible to M.tuberculosis, underlining the importance of a cytotoxic T-cellresponse in protection against disease3,39. Given the many potentialeffects of the PPE and PE proteins, it is important that furtherstudies are performed to understand their activity. If extensiveantigenic variability or reduced antigen presentation were indeedfound, this would be significant for vaccine design and for under-standing protective immunity in tuberculosis, and might evenexplain the varied responses seen in different BCG vaccinationprogrammes40.Pathogenicity. Despite intensive research efforts, there is littleinformation about the molecular basis of mycobacterial virulence41.However, this situation should now change as the genome sequencewill accelerate the study of pathogenesis as never before, becauseother bacterial factors that may contribute to virulence are becom-ing apparent. Before the completion of the genome sequence, onlythree virulence factors had been described41: catalase-peroxidase,which protects against reactive oxygen species produced by thephagocyte; mce, which encodes macrophage-colonizing factor42;and a sigma factor gene, sigA (aka rpoV), mutations in which canlead to attenuation41. In addition to these single-gene virulencefactors, the mycobacterial cell wall4 is also important in pathology,
but the complex nature of its biosynthesis makes it difficult toidentify critical genes whose inactivation would lead to attenuation.
On inspection of the genome sequence, it was apparent that fourcopies of mce were present and that these were all situated inoperons, comprising eight genes, organized in exactly the samemanner. In each case, the genes preceding mce code for integralmembrane proteins, whereas mce and the following five genes are allpredicted to encode proteins with signal sequences or hydrophobicstretches at the N terminus. These sets of proteins, about which littleis known, may well be secreted or surface-exposed; this is consistentwith the proposed role of Mce in invasion of host cells42. Further-more, a homologue of smpB, which has been implicated in intra-cellular survival of Salmonella typhimurium, has also beenidentified43. Among the other secreted proteins identified fromthe genome sequence that could act as virulence factors are aseries of phospholipases C, lipases and esterases, which mightattack cellular or vacuolar membranes, as well as several proteases.One of these phospholipases acts as a contact-dependent haemoly-sin (N. Stoker, personal communication). The presence of storageproteins in the bacillus, such as the haemoglobin-like oxygencaptors described above, points to its ability to stockpile essentialgrowth factors, allowing it to persist in the nutrient-limited envir-onment of the phagosome. In this regard, the ferritin-like proteins,encoded by bfrA and bfrB, may be important in intracellular survivalas the capacity to acquire enough iron in the vacuole is verylimited. M. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Methods
Sequence analysis. Initially, ,3.2 Mb of sequence was generated fromcosmids8 and the remainder was obtained from selected BAC clones7 and45,000 whole-genome shotgun clones. Sheared fragments (1.4–2.0 kb) fromcosmids and BACs were cloned into M13 vectors, whereas genomic DNA wascloned in pUC18 to obtain both forward and reverse reads. The PGRS geneswere grossly underrepresented in pUC18 but better covered in the BAC andcosmid M13 libraries. We used small-insert libraries44 to sequence regionsprone to compression or deletion and, in some cases, obtained sequences fromproducts of the polymerase chain reaction or directly from BACs7. All shotgunsequencing was performed with standard dye terminators to minimize com-pression problems, whereas finishing reactions used dRhodamine or BigDyeterminators (http://www.sanger.ac.uk). Problem areas were verified by usingdye primers. Thirty differences were found between the genomic shotgunsequences and the cosmids; twenty of which were due to sequencing errors andten to mutations in cosmids (1 error per 320 kb). Less than 0.1% of thesequence was from areas of single-clone coverage, and ,0.2% was from onestrand with only one sequencing chemistry.Informatics. Sequence assembly involved PHRAP, GAP4 (ref. 45) and acustomized perl script that merges sequences from different libraries andgenerates segments that can be processed by several finishers simultaneously.Sequence analysis and annotation was managed by DIANA (B.G.B. et al.,unpublished). Genes encoding proteins were identified by TB-parse46 using ahidden Markov model trained on known M. tuberculosis coding and non-coding regions and translation-initiation signals, with corroboration by posi-tional base preference. Interrogation of the EMBL, TREMBL, SwissProt,PROSITE47 and in-house databases involved BLASTN, BLASTX48, DOTTER(http://www.sanger.ac.uk) and FASTA49. tRNA genes were located and identi-fied using tRNAscan and tRNAscan-SE50. The complete sequence, a list ofannotated cosmids and linking regions can be found on our website (http://www. sanger.ac.uk) and in MycDB (http://www.pasteur.fr/mycdb/).
Received 15 April; accepted 8 May 1998.
1. Snider, D. E. Jr, Raviglione, M. & Kochi, A. in Tuberculosis: Pathogenesis, Protection, and Control (ed.Bloom, B. R.) 2–11 (Am. Soc. Microbiol., Washington DC, 1994).
2. Wheeler, P. R. & Ratledge, C. in Tuberculosis: Pathogenesis, Protection, and Control (ed. Bloom, B. R.)353–385 (Am. Soc. Microbiol., Washington DC, 1994).
3. Chan, J. & Kaufmann, S. H. E. in Tuberculosis: Pathogenesis, Protection, and Control (ed. Bloom, B. R.)271–284 (Am. Soc. Microbiol., Washington DC, 1994).
4. Brennan, P. J. & Draper, P. in Tuberculosis: Pathogenesis, Protection, and Control (ed. Bloom, B. R.)271–284 (Am. Soc. Microbiol., Washington DC, 1994).
5. Kolattukudy, P. E., Fernandes, N. D., Azad, A. K., Fitzmaurice, A. M. & Sirakova, T. D. Biochemistry
article
NATURE | VOL 393 | 11 JUNE 1998 543
Nature © Macmillan Publishers Ltd 1998
8
and molecular genetics of cell-wall lipid biosynthesis in mycobacteria. Mol. Microbiol. 24, 263–270(1997).
6. Sreevatsan, S. et al. Restricted structural gene polymorphism in the Mycobacterium tuberculosiscomplex indicates evolutionarily recent global dissemination. Proc. Natl Acad. Sci. USA 94, 9869–9874 (1997).
7. Brosch, R. et al. Use of a Mycobacterium tuberculosis H37Rv bacterial artificial chromosome library forgenome mapping, sequencing and comparative genomics. Infect. Immun. 66, 2221–2229 (1998).
8. Philipp, W. J. et al. An integrated map of the genome of the tubercle bacillus, Mycobacteriumtuberculosis H37Rv, and comparison with Mycobacterium leprae. Proc. Natl Acad. Sci. USA 93, 3132–3137 (1996).
9. Blattner, F. R. et al. The complete genome sequence of Escherichia coli K-12. Science 277, 1453–1462(1997).
10. Cole, S. T. & Saint-Girons, I. Bacterial genomics. FEMS Microbiol. Rev. 14, 139–160 (1994).11. Freiberg, C. et al. Molecular basis of symbiosis between Rhizobium and legumes. Nature 387, 394–401
(1997).12. Bardarov, S. et al. Conditionally replicating mycobacteriophages: a system for transposon delivery to
Mycobacterium tuberculosis. Proc. Natl Acad. Sci. USA 94, 10961–10966 (1997).13. Mahairas, G. G., Sabo, P. J., Hickey, M. J., Singh, D. C. & Stover, C. K. Molecular analysis of genetic
differences between Mycobacterium bovis BCG and virulent M. bovis. J. Bacteriol. 178, 1274–1282(1996).
14. Kunst, F. et al. The complete genome sequence of the gram-positive bacterium Bacillus subtilis. Nature390, 249–256 (1997).
15. Smith, D. R. et al. Multiplex sequencing of 1.5 Mb of the Mycobacterium leprae genome. Genome Res.7, 802–819 (1997).
16. Greenacre, M. Theory and Application of Correspondence Analysis (Academic, London, 1984).17. Ratledge, C. R. in The Biology of the Mycobacteria (eds Ratledge, C. & Stanford, J.) 53–94 (Academic,
San Diego, 1982).18. Av-Gay, Y. & Davies, J. Components of eukaryotic-like protein signaling pathways in Mycobacterium
tuberculosis. Microb. Comp. Genomics 2, 63–73 (1997).19. Cole, S. T. & Telenti, A. Drug resistance in Mycobacterium tuberculosis. Eur. Resp. Rev. 8, 701S–713S
(1995).20. Riley, M. & Labedan, B. in Escherichia coli and Salmonella (ed. Neidhardt, F. C.) 2118–2202 (ASM,
Washington, 1996).21. Mdluli, K. et al. Inhibition of a Mycobacterium tuberculosis b-ketoacyl ACP synthase by isoniazid.
Science 280, 1607–1610 (1998).22. Banerjee, A. et al. inhA, a gene encoding a target for isoniazid and ethionamide in Mycobacterium
tuberculosis. Science 263, 227–230 (1994).23. Hopwood, D. A. Genetic contributions to understanding polyketide synthases. Chem. Rev. 97, 2465–
2497 (1997).24. Minnikin, D. E. in The Biology of the Mycobacteria (eds Ratledge, C. & Stanford, J.) 95–184 (Academic,
London, 1982).25. Barry, C. E. III et al. Mycolic acids: structure, biosynthesis, and phsyiological functions. Prog. Lipid
Res. (in the press).26. Belisle, J. T. et al. Role of the major antigen of Mycobacterium tuberculosis in cell wall biogenesis.
Science 276, 1420–1422 (1997).27. Marahiel, M. A., Stachelhaus, T. & Mootz, H. D. Modular peptide synthetases involved in
nonribosomal peptide synthesis. Chem. Rev. 97, 2651–2673 (1997).28. Gobin, J. et al. Iron acquisition by Mycobacterium tuberculosis: isolation and characterization of a
family of iron-binding exochelins. Proc. Natl Acad. Sci. USA 92, 5189–5193 (1995).29. Young, D. B. & Fruth, U. in New Generation Vaccines (eds Levine, M., Woodrow, G., Kaper, J. & Cobon,
G. S.) 631–645 (Marcel Dekker, New York, 1997).30. Sorensen, A. L., Nagai, S., Houen, G., Andersen, P. & Anderson, A. B. Purification and characterization
of a low-molecular-mass T-cell antigen secreted by Mycobacterium tuberculosis. Infect. Immun. 63,
1710–1717 (1995).31. Hermans, P. W. M., van Soolingen, D. & van Embden, J. D. A. Characterization of a major
polymorphic tandem repeat in Mycobacterium tuberculosis and its potential use in the epidemiologyof Mycobacterium kansasii and Mycobacterium gordonae. J. Bacteriol. 174, 4157–4165 (1992).
32. Poulet, S. & Cole, S. T. Characterisation of the polymorphic GC-rich repetitive sequence (PGRS)present in Mycobacterium tuberculosis. Arch. Microbiol. 163, 87–95 (1995).
33. Cole, S. T. & Barrell, B. G. in Genetics and Tuberculosis (eds Chadwick, D. J. & Cardew, G., NovartisFoundation Symp. 217) 160–172 (Wiley, Chichester, 1998).
34. Vega-Lopez, F. et al. Sequence and immunological characterization of a serine-rich antigen fromMycobacterium leprae. Infect. Immun. 61, 2145–2153 (1993).
35. Abou-Zeid, C. et al. Genetic and immunological analysis of Mycobacterium tuberculosis fibronectin-binding proteins. Infect. Immun. 59, 2712–2718 (1991).
36. Robertson, B. D. & Meyer, T. F. Genetic variation in pathogenic bacteria. Trends Genet. 8, 422–427(1992).
37. Levitskaya, J. et al. Inhibition of antigen processing by the internal repeat region of the Epstein-Barrvirus nuclear antigen-1. Nature 375, 685–688 (1995).
38. Levitskaya, J., Sharipo, A., Leonchiks, A., Ciechanover, A. & Masucci, M. G. Inhibition of ubiquitin/proteasome-dependent protein degradation by the Gly-Ala repeat domain of the Epstein-Barr virusnuclear antigen 1. Proc. Natl Acad. Sci. USA 94, 12616–12621 (1997).
39. Flynn, J. L., Goldstein, M. A., Treibold, K. J., Koller, B. & Bloom, B. R. Major histocompatabilitycomplex class-I restricted T cells are required for resistance to Mycobacterium tuberculosis infection.Proc. Natl Acad. Sci. USA 89, 12013–12017 (1992).
40. Bloom, B. R. & Fine, P. E. M. in Tuberculosis: Pathogenesis, Protection, and Control (ed. Bloom, B. R.)531–557 (Am. Soc. Microbiol., Washington DC, 1994).
41. Collins, D. M. In search of tuberculosis virulence genes. Trends Microbiol. 4, 426–430 (1996).42. Arruda, S., Bomfim, G., Knights, R., Huima-Byron, T. & Riley, L. W. Cloning of an M. tuberculosis
DNA fragment associated with entry and survival inside cells. Science 261, 1454–1457 (1993).43. Baumler, A. J., Kusters, J. G., Stojikovic, I. & Heffron, F. Salmonella typhimurium loci involved in
survival within macrophages. Infect. Immun. 62, 1623–1630 (1994).44. McMurray, A. A., Sulston, J. E. & Quail, M. A. Short-insert libraries as a method of problem solving in
genome sequencing. Genome Res. 8, 562–566 (1998).45. Bonfield, J. K., Smith, K. F. & Staden, R. A new DNA sequence assembly program. Nucleic Acids Res.
24, 4992–4999 (1995).46. Krogh, A., Mian, I. S. & Haussler, D. A hidden Markov model that finds genes in E. coli DNA. Nucleic
Acids Res. 22, 4768–4778 (1994).47. Bairoch, A., Bucher, P. & Hofmann, K. The PROSITE database, its status in 1997. Nucleic Acids Res. 25,
217–221 (1997).48. Altschul, S., Gish, W., Miller, W., Myers, E. & Lipman, D. A basic local alignment search tool. J. Mol.
Biol. 215, 403–410 (1990).49. Pearson, W. & Lipman, D. Improved tools for biological sequence comparisons. Proc. Natl Acad. USA
85, 2444–2448 (1988).50. Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in
genomic DNA. Nucleic Acids Res. 25, 955–964 (1997).
Acknowledgements. We thank Y. Av-Gay, F.-C. Bange, A. Danchin, B. Dujon, W. R. Jacobs Jr, L. Jones,M. McNeil, I. Moszer, P. Rice and J. Stephenson for advice, reagents and support. This work was supportedby the Wellcome Trust. Additional funding was provided by the Association Francaise Raoul Follereau,the World Health Organisation and the Institut Pasteur. S.V.G. received a Wellcome Trust travellingresearch fellowship.
Correspondence and requests for materials should be addressed to B.G.B. ([email protected]) or S.T.C.([email protected]). The complete sequence has been deposited in EMBL/GenBank/DDBJ as MTBH37RV,accession number AL123456.
article
544 NATURE | VOL 393 | 11 JUNE 1998
Nature © Macmillan Publishers Ltd 1998
8
I. Small-molecule metabolismA. Degradation1. Carbon compoundsRv0186 bglS β-glucosidaseRv2202c cbhK carbohydrate kinaseRv0727c fucA L-fuculose phosphate aldolaseRv1731 gabD1 succinate-semialdehyde dehydro-
genaseRv0234c gabD2 succinate-semialdehyde dehydro-
genaseRv0501 galE1 UDP-glucose 4-epimeraseRv0536 galE2 UDP-glucose 4-epimeraseRv0620 galK galactokinaseRv0619 galT galactose-1-phosphate uridylyl-
transferase C-termRv0618 galT' galactose-1-phosphate uridylyl-
transferase N-termRv0993 galU UTP-glucose-1-phosphate uridylyl-
transferaseRv3696c glpK ATP:glycerol 3-phosphotrans-
feraseRv3255c manA mannose-6-phosphate isomeraseRv3441c mrsA phosphoglucomutase or phospho-
mannomutaseRv0118c oxcA oxalyl-CoA decarboxylaseRv3068c pgmA phosphoglucomutaseRv3257c pmmA phosphomannomutaseRv3308 pmmB phosphomannomutaseRv2702 ppgK polyphosphate glucokinaseRv0408 pta phosphate acetyltransferaseRv0729 xylB xylulose kinaseRv1096 - carbohydrate degrading enzyme
2. Amino acids and aminesRv1905c aao D-amino acid oxidaseRv2531c adi ornithine/arginine decarboxylaseRv2780 ald L-alanine dehydrogenaseRv1538c ansA L-asparaginaseRv1001 arcA arginine deiminaseRv0753c mmsA methylmalmonate semialdehyde
dehydrogenaseRv0751c mmsB methylmalmonate semialdehyde
oxidoreductaseRv1187 rocA pyrroline-5-carboxylate dehydro-
genaseRv2322c rocD1 ornithine aminotransferaseRv2321c rocD2 ornithine aminotransferaseRv1848 ureA urease γ subunitRv1849 ureB urease β subunitRv1850 ureC urease α subunitRv1853 ureD urease accessory proteinRv1851 ureF urease accessory proteinRv1852 ureG urease accessory proteinRv2913c - probable D-amino acid
aminohydrolaseRv3551 - possible glutaconate CoA-
transferase
3. Fatty acidsRv2501c accA1 acetyl/propionyl-CoA carboxylase,
α subunitRv0973c accA2 acetyl/propionyl-CoA carboxylase,
α subunitRv2502c accD1 acetyl/propionyl-CoA carboxylase,
β subunitRv0974c accD2 acetyl/propionyl-CoA carboxylase,
β subunitRv3667 acs acetyl-CoA synthaseRv3409c choD cholesterol oxidaseRv0222 echA1 enoyl-CoA hydratase/isomerase
superfamily Rv0456c echA2 enoyl-CoA hydratase/isomerase
superfamily Rv0632c echA3 enoyl-CoA hydratase/isomerase
superfamily Rv0673 echA4 enoyl-CoA hydratase/isomerase
superfamily Rv0675 echA5 enoyl-CoA hydratase/isomerase
superfamily Rv0905 echA6 enoyl-CoA hydratase/isomerase
superfamily (aka eccH)Rv0971c echA7 enoyl-CoA hydratase/isomerase
superfamily Rv1070c echA8 enoyl-CoA hydratase/isomerase
superfamily Rv1071c echA9 enoyl-CoA hydratase/isomerase
superfamily Rv1142c echA10 enoyl-CoA hydratase/isomerase
superfamilyRv1141c echA11 enoyl-CoA hydratase/isomerase
superfamilyRv1472 echA12 enoyl-CoA hydratase/isomerase
superfamilyRv1935c echA13 enoyl-CoA hydratase/isomerase
superfamily Rv2486 echA14 enoyl-CoA hydratase/isomerase
superfamily Rv2679 echA15 enoyl-CoA hydratase/isomerase
superfamily Rv2831 echA16 enoyl-CoA hydratase/isomerase
superfamily Rv3039c echA17 enoyl-CoA hydratase/isomerase
superfamilyRv3373 echA18 enoyl-CoA hydratase/isomerase
superfamily, N-termRv3374 echA18' enoyl-CoA hydratase/isomerase
superfamily, C-termRv3516 echA19 enoyl-CoA hydratase/isomerase
superfamily Rv3550 echA20 enoyl-CoA hydratase/isomerase
superfamilyRv3774 echA21 enoyl-CoA hydratase/isomerase
superfamilyRv0859 fadA β oxidation complex, β subunit
(acetyl-CoA C-acetyltransferase)Rv0243 fadA2 acetyl-CoA C-acetyltransferase Rv1074c fadA3 acetyl-CoA C-acetyltransferaseRv1323 fadA4 acetyl-CoA C-acetyltransferase
(aka thiL)Rv3546 fadA5 acetyl-CoA C-acetyltransferaseRv3556c fadA6 acetyl-CoA C-acetyltransferaseRv0860 fadB β oxidation complex, α subunit
(multiple activities)Rv0468 fadB2 3-hydroxyacyl-CoA dehydroge-
naseRv1715 fadB3 3-hydroxyacyl-CoA dehydroge-
naseRv3141 fadB4 3-hydroxyacyl-CoA dehydroge-
naseRv1912c fadB5 3-hydroxyacyl-CoA dehydroge-
naseRv1750c fadD1 acyl-CoA synthaseRv0270 fadD2 acyl-CoA synthaseRv3561 fadD3 acyl-CoA synthaseRv0214 fadD4 acyl-CoA synthaseRv0166 fadD5 acyl-CoA synthaseRv1206 fadD6 acyl-CoA synthaseRv0119 fadD7 acyl-CoA synthaseRv0551c fadD8 acyl-CoA synthaseRv2590 fadD9 acyl-CoA synthaseRv0099 fadD10 acyl-CoA synthaseRv1550 fadD11 acyl-CoA synthase, N-termRv1549 fadD11' acyl-CoA synthase, C-termRv1427c fadD12 acyl-CoA synthaseRv3089 fadD13 acyl-CoA synthaseRv1058 fadD14 acyl-CoA synthaseRv2187 fadD15 acyl-CoA synthaseRv0852 fadD16 acyl-CoA synthaseRv3506 fadD17 acyl-CoA synthaseRv3513c fadD18 acyl-CoA synthaseRv3515c fadD19 acyl-CoA synthaseRv1185c fadD21 acyl-CoA synthaseRv2948c fadD22 acyl-CoA synthaseRv3826 fadD23 acyl-CoA synthaseRv1529 fadD24 acyl-CoA synthaseRv1521 fadD25 acyl-CoA synthaseRv2930 fadD26 acyl-CoA synthaseRv0275c fadD27 acyl-CoA synthaseRv2941 fadD28 acyl-CoA synthaseRv2950c fadD29 acyl-CoA synthaseRv0404 fadD30 acyl-CoA synthaseRv1925 fadD31 acyl-CoA synthaseRv3801c fadD32 acyl-CoA synthaseRv1345 fadD33 acyl-CoA synthaseRv0035 fadD34 acyl-CoA synthaseRv2505c fadD35 acyl-CoA synthaseRv1193 fadD36 acyl-CoA synthaseRv0131c fadE1 acyl-CoA dehydrogenase Rv0154c fadE2 acyl-CoA dehydrogenase Rv0215c fadE3 acyl-CoA dehydrogenase Rv0231 fadE4 acyl-CoA dehydrogenase Rv0244c fadE5 acyl-CoA dehydrogenase Rv0271c fadE6 acyl-CoA dehydrogenase Rv0400c fadE7 acyl-CoA dehydrogenase Rv0672 fadE8 acyl-CoA dehydrogenase
(aka aidB)Rv0752c fadE9 acyl-CoA dehydrogenaseRv0873 fadE10 acyl-CoA dehydrogenase Rv0972c fadE12 acyl-CoA dehydrogenase Rv0975c fadE13 acyl-CoA dehydrogenaseRv1346 fadE14 acyl-CoA dehydrogenase Rv1467c fadE15 acyl-CoA dehydrogenase Rv1679 fadE16 acyl-CoA dehydrogenase Rv1934c fadE17 acyl-CoA dehydrogenaseRv1933c fadE18 acyl-CoA dehydrogenase Rv2500c fadE19 acyl-CoA dehydrogenase
(aka mmgC)Rv2724c fadE20 acyl-CoA dehydrogenase Rv2789c fadE21 acyl-CoA dehydrogenaseRv3061c fadE22 acyl-CoA dehydrogenase Rv3140 fadE23 acyl-CoA dehydrogenase Rv3139 fadE24 acyl-CoA dehydrogenase Rv3274c fadE25 acyl-CoA dehydrogenase Rv3504 fadE26 acyl-CoA dehydrogenase Rv3505 fadE27 acyl-CoA dehydrogenaseRv3544c fadE28 acyl-CoA dehydrogenase
Rv3543c fadE29 acyl-CoA dehydrogenase Rv3560c fadE30 acyl-CoA dehydrogenase Rv3562 fadE31 acyl-CoA dehydrogenase Rv3563 fadE32 acyl-CoA dehydrogenase Rv3564 fadE33 acyl-CoA dehydrogenaseRv3573c fadE34 acyl-CoA dehydrogenase Rv3797 fadE35 acyl-CoA dehydrogenase Rv3761c fadE36 acyl-CoA dehydrogenaseRv1175c fadH 2,4-Dienoyl-CoA ReductaseRv0855 far fatty acyl-CoA racemaseRv1143 mcr α-methyl acyl-CoA racemaseRv1492 mutA methylmalonyl-CoA mutase, β
subunitRv1493 mutB methylmalonyl-CoA mutase, α
subunitRv2504c scoA 3-oxo acid:CoA transferase, α sub-
unitRv2503c scoB 3-oxo acid:CoA transferase, β sub-
unitRv1136 - probable carnitine racemaseRv1683 - possible acyl-CoA synthase
4. Phosphorous compoundsRv2368c phoH ATP-binding pho regulon
componentRv1095 phoH2 PhoH-like proteinRv3628 ppa probable inorganic pyrophos-
phataseRv2984 ppk polyphosphate kinase
B. Energy metabolism1. GlycolysisRv1023 eno enolaseRv0363c fba fructose bisphosphate aldolaseRv1436 gap glyceraldehyde 3-phosphate dehy-
drogenaseRv0489 gpm phosphoglycerate mutase IRv3010c pfkA phosphofructokinase IRv2029c pfkB phosphofructokinase IIRv0946c pgi glucose-6-phosphate isomerase Rv1437 pgk phosphoglycerate kinaseRv1617 pykA pyruvate kinaseRv1438 tpi triosephosphate isomeraseRv2419c - putative phosphoglycerate mutaseRv3837c - putative phosphoglycerate mutase
2. Pyruvate dehydrogenaseRv2241 aceE pyruvate dehydrogenase E1 com-
ponentRv3303c lpdA dihydrolipoamide dehydrogenaseRv2497c pdhA pyruvate dehydrogenase E1 com-
ponent α subunitRv2496c pdhB pyruvate dehydrogenase E1 com-
ponent β subunitRv2495c pdhC dihydrolipoamide acetyltransferaseRv0462 - probable dihydrolipoamide dehy-
drogenase
3. TCA cycleRv1475c acn aconitate hydrataseRv0889c citA citrate synthase 2Rv2498c citE citrate lyase β chainRv1098c fum fumaraseRv1131 gltA1 citrate synthase 3Rv0896 gltA2 citrate synthase 1 Rv3339c icd1 isocitrate dehydrogenaseRv0066c icd2 isocitrate dehydrogenaseRv0794c lpdB dihydrolipoamide dehydrogenaseRv1240 mdh malate dehydrogenaseRv2967c pca pyruvate carboxylaseRv3318 sdhA succinate dehydrogenase ARv3319 sdhB succinate dehydrogenase BRv3316 sdhC succinate dehydrogenase C sub-
unitRv3317 sdhD succinate dehydrogenase D sub-
unitRv1248c sucA 2-oxoglutarate dehydrogenaseRv2215 sucB dihydrolipoamide succinyltrans-
feraseRv0951 sucC succinyl-CoA synthase β chainRv0952 sucD succinyl-CoA synthase α chain
4. Glyoxylate bypassRv0467 aceA isocitrate lyaseRv1915 aceAa isocitrate lyase, α moduleRv1916 aceAb isocitrate lyase, β moduleRv1837c glcB malate synthaseRv3323c gphA phosphoglycolate phosphatase
5. Pentose phosphate pathwayRv1445c devB glucose-6-phosphate 1-dehydro-
genaseRv1844c gnd 6-phosphogluconate dehydroge-
nase (Gram –)Rv1122 gnd2 6-phosphogluconate dehydroge-
nase (Gram +)Rv1446c opcA unknown function, may aid
G6PDH
Table 1. Functional classification of Mycobacterium tuberculosis protein-coding genes
Nature © Macmillan Publishers Ltd 1998
8
Rv2436 rbsK ribokinaseRv1408 rpe ribulose-phosphate 3-epimeraseRv2465c rpi phosphopentose isomeraseRv1448c tal transaldolaseRv1449c tkt transketolaseRv1121 zwf glucose-6-phosphate 1-dehydro-
genaseRv1447c zwf2 glucose-6-phosphate 1-dehydro-
genase
6. Respirationa. aerobicRv0527 ccsA cytochrome c-type biogenesis
proteinRv0529 ccsB cytochrome c-type biogenesis
proteinRv1451 ctaB cytochrome c oxidase assembly
factorRv2200c ctaC cytochrome c oxidase chain IIRv3043c ctaD cytochrome c oxidase poly-
peptide IRv2193 ctaE cytochrome c oxidase poly-
peptide IIIRv1542c glbN hemoglobin-like, oxygen carrierRv2470 glbO hemoglobin-like, oxygen carrierRv2249c glpD1 glycerol-3-phosphate dehydroge-
naseRv3302c glpD2 glycerol-3-phosphate dehydroge-
naseRv0694 lldD1 L-lactate dehydrogenase
(cytochrome) Rv1872c lldD2 L-lactate dehydrogenaseRv1854c ndh probable NADH dehydrogenaseRv3145 nuoA NADH dehydrogenase chain ARv3146 nuoB NADH dehydrogenase chain BRv3147 nuoC NADH dehydrogenase chain CRv3148 nuoD NADH dehydrogenase chain DRv3149 nuoE NADH dehydrogenase chain ERv3150 nuoF NADH dehydrogenase chain FRv3151 nuoG NADH dehydrogenase chain GRv3152 nuoH NADH dehydrogenase chain HRv3153 nuoI NADH dehydrogenase chain IRv3154 nuoJ NADH dehydrogenase chain JRv3155 nuoK NADH dehydrogenase chain KRv3156 nuoL NADH dehydrogenase chain LRv3157 nuoM NADH dehydrogenase chain MRv3158 nuoN NADH dehydrogenase chain NRv2195 qcrA Rieske iron-sulphur component of
ubiQ-cytB reductaseRv2196 qcrB cytochrome β component of ubiQ-
cytB reductaseRv2194 qcrC cytochrome b/c component of
ubiQ-cytB reductase
b. anaerobicRv2392 cysH 3'-phosphoadenylylsulfate (PAPS)
reductaseRv2899c fdhD affects formate dehydrogenase-NRv2900c fdhF molybdopterin-containing oxidore-
ductaseRv1552 frdA fumarate reductase flavoprotein
subunitRv1553 frdB fumarate reductase iron sulphur
proteinRv1554 frdC fumarate reductase 15kD anchor
proteinRv1555 frdD fumarate reductase 13kD anchor
proteinRv1161 narG nitrate reductase α subunitRv1162 narH nitrate reductase β chainRv1164 narI nitrate reductase γ chainRv1163 narJ nitrate reductase δ chainRv1736c narX fused nitrate reductaseRv2391 nirA probable nitrite reductase/sulphite
reductaseRv0252 nirB nitrite reductase flavoproteinRv0253 nirD probable nitrite reductase small
subunit
c. Electron transportRv0409 ackA acetate kinaseRv1623c appC cytochrome bd-II oxidase
subunit IRv1622c cydB cytochrome d ubiquinol oxidase
subunit IIRv1620c cydC ABC transporterRv1621c cydD ABC transporterRv2007c fdxA ferredoxinRv3554 fdxB ferredoxinRv1177 fdxC ferredoxin 4Fe-4SRv3503c fdxD probable ferredoxinRv3029c fixA electron transfer flavoprotein
β subunitRv3028c fixB electron transfer flavoprotein α
subunitRv3106 fprA adrenodoxin and NADPH ferre-
doxin reductaseRv0886 fprB ferredoxin, ferredoxin-NADP
reductaseRv3251c rubA rubredoxin A
Rv3250c rubB rubredoxin B
7. Miscellaneous oxidoreductases and oxygenases 171
8. ATP-proton motive forceRv1308 atpA ATP synthase α chainRv1304 atpB ATP synthase α chainRv1311 atpC ATP synthase e chainRv1310 atpD ATP synthase β chainRv1305 atpE ATP synthase c chainRv1306 atpF ATP synthase b chainRv1309 atpG ATP synthase γ chainRv1307 atpH ATP synthase δ chain
C. Central intermediary metabolism1. GeneralRv2589 gabT 4-aminobutyrate aminotransferaseRv3432c gadB glutamate decarboxylaseRv1832 gcvB glycine decarboxylase Rv1826 gcvH glycine cleavage system H proteinRv2211c gcvT T protein of glycine cleavage
systemRv1213 glgC glucose-1-phosphate adenylyl-
transferaseRv3842c glpQ1 glycerophosphoryl diester phos-
phodiesteraseRv0317c glpQ2 glycerophosphoryl diester phos-
phodiesteraseRv3566c nhoA N-hydroxyarylamine o-acetyltrans-
feraseRv0155 pntAA pyridine transhydrogenase sub-
unit α1Rv0156 pntAB pyridine transhydrogenase sub-
unit α2Rv0157 pntB pyridine transhydrogenase
subunit βRv1127c ppdK similar to pyruvate, phosphate
dikinase
2. GluconeogenesisRv0211 pckA phosphoenolpyruvate carboxy-
kinase Rv0069c sdaA L-serine dehydratase 1
3. Sugar nucleotidesRv1512 epiA nucleotide sugar epimeraseRv3784 epiB probable UDP-galactose 4-
epimeraseRv1511 gmdA GDP-mannose 4,6 dehydrataseRv0334 rmlA glucose-1-phosphate thymidyl-
transferaseRv3264c rmlA2 glucose-1-phosphate thymidyl-
transferaseRv3464 rmlB dTDP-glucose 4,6-dehydrataseRv3634c rmlB2 dTDP-glucose 4,6-dehydrataseRv3468c rmlB3 dTDP-glucose 4,6-dehydrataseRv3465 rmlC dTDP-4-dehydrorhamnose
3,5-epimeraseRv3266c rmlD dTDP-4-dehydrorhamnose
reductaseRv0322 udgA UDP-glucose
dehydrogenase/GDP-mannose 6-dehydrogenase
Rv3265c wbbL dTDP-rhamnosyl transferaseRv1525 wbbl2 dTDP-rhamnosyl transferaseRv3400 - probable β-phosphoglucomutase
4. Amino sugarsRv3436c glmS glucosamine-fructose-6-
phosphate aminotransferase
5. Sulphur metabolismRv0711 atsA arylsulfataseRv3299c atsB proable arylsulfataseRv0663 atsD proable arylsulfataseRv3077 atsF proable arylsulfataseRv0296c atsG proable arylsulfataseRv3796 atsH proable arylsulfataseRv1285 cysD ATP:sulphurylase subunit 2Rv1286 cysN ATP:sulphurylase subunit 1Rv2131c cysQ homologue of M.leprae cysQRv3248c sahH adenosylhomocysteinaseRv3283 sseA thiosulfate sulfurtransferaseRv2291 sseB thiosulfate sulfurtransferaseRv3118 sseC thiosulfate sulfurtransferaseRv0814c sseC2 thiosulfate sulfurtransferaseRv3762c - probable alkyl sulfatase
D. Amino acid biosynthesis1. Glutamate familyRv1654 argB acetylglutamate kinaseRv1652 argC N-acetyl-γ-glutamyl-phosphate
reductaseRv1655 argD acetylornithine aminotransferaseRv1656 argF ornithine carbamoyltransferaseRv1658 argG arginosuccinate synthaseRv1659 argH arginosuccinate lyase Rv1653 argJ glutamate N-acetyltransferaseRv2220 glnA1 glutamine synthase class IRv2222c glnA2 glutamine synthase class II
Rv1878 glnA3 probable glutamine synthaseRv2860c glnA4 proable glutamine synthaseRv2918c glnD uridylyltransferaseRv2221c glnE glutamate-ammonia-ligase
adenyltransferaseRv3859c gltB ferredoxin-dependent glutamate
synthaseRv3858c gltD small subunit of NADH-dependent
glutamate synthaseRv3704c gshA possible γ-glutamylcysteine syn-
thaseRv2427c proA γ-glutamyl phosphate reductaseRv2439c proB glutamate 5-kinase Rv0500 proC pyrroline-5-carboxylate reductase
2. Aspartate familyRv3708c asd aspartate semialdehyde dehydro-
genaseRv3709c ask aspartokinaseRv2201 asnB asparagine synthase BRv3565 aspB aspartate aminotransferaseRv0337c aspC aspartate aminotransferaseRv2753c dapA dihydrodipicolinate synthaseRv2773c dapB dihydrodipicolinate reductaseRv1202 dapE succinyl-diaminopimelate desuc-
cinylaseRv2141c dapE2 ArgE/DapE/Acy1/Cpg2/yscS
familyRv2726c dapF diaminopimelate epimeraseRv1293 lysA diaminopimelate decarboxylaseRv3341 metA homoserine o-acetyltransferaseRv1079 metB cystathionine γ-synthaseRv3340 metC cystathionine β-lyaseRv1133c metE 5-methyltetrahydropteroyltrigluta-
mate-homocysteine methyltrans-ferase
Rv2124c metH 5-methyltetrahydrofolate-homo-cysteine methyltransferase
Rv1392 metK S-adenosylmethionine synthaseRv0391 metZ o-succinylhomoserine sulfhy-
drylaseRv1294 thrA homoserine dehydrogenaseRv1296 thrB homoserine kinaseRv1295 thrC homoserine synthase
3. Serine familyRv0815c cysA2 thiosulfate sulfurtransferaseRv3117 cysA3 thiosulfate sulfurtransferaseRv2335 cysE serine acetyltransferaseRv0511 cysG uroporphyrin-III c-methyltrans-
feraseRv2847c cysG2 multifunctional enzyme, siroheme
synthaseRv2334 cysK cysteine synthase ARv1336 cysM cysteine synthase BRv1077 cysM2 cystathionine β-synthaseRv0848 cysM3 putative cysteine synthaseRv1093 glyA serine hydroxymethyltransferaseRv0070c glyA2 serine hydroxymethyltransferaseRv2996c serA D-3-phosphoglycerate dehydro-
genaseRv0505c serB probable phosphoserine phos-
phataseRv3042c serB2 C-term similar to phosphoserine
phosphataseRv0884c serC phosphoserine aminotransferase
4. Aromatic amino acid familyRv3227 aroA 3-phosphoshikimate
1-carboxyvinyl transferaseRv2538c aroB 3-dehydroquinate synthaseRv2537c aroD 3-dehydroquinate dehydrataseRv2552c aroE shikimate 5-dehydrogenaseRv2540c aroF chorismate synthaseRv2178c aroG DAHP synthaseRv2539c aroK shikimate kinase IRv3838c pheA prephenate dehydrataseRv1613 trpA tryptophan synthase α chainRv1612 trpB tryptophan synthase β chainRv1611 trpC indole-3-glycerol phosphate
synthaseRv2192c trpD anthranilate phosphoribosyltrans-
feraseRv1609 trpE anthranilate synthase
component IRv2386c trpE2 anthranilate synthase
component IRv3754 tyrA prephenate dehydrogenase
5. HistidineRv1603 hisA phosphoribosylformimino-5-
aminoimidazole carboxamide ribonucleotide isomerase
Rv1601 hisB imidazole glycerol-phosphate dehydratase
Rv1600 hisC histidinol-phosphate aminotrans-ferase
Rv3772 hisC2 histidinol-phosphate aminotrans-ferase
Rv1599 hisD histidinol dehydrogenase
Nature © Macmillan Publishers Ltd 1998
8
Rv1605 hisF imidazole glycerol-phosphate synthase
Rv2121c hisG ATP phosphoribosyltransferaseRv1602 hisH amidotransferaseRv2122c hisI phosphoribosyl-AMP cyclohydro-
laseRv1606 hisI2 probable phosphoribosyl-AMP 1,6
cyclohydrolaseRv0114 - similar to HisB
6. Pyruvate familyRv3423c alr alanine racemase
7. Branched amino acid familyRv1559 ilvA threonine deaminaseRv3003c ilvB acetolactate synthase I large sub-
unit Rv3470c ilvB2 acetolactate synthase large sub-
unitRv3001c ilvC ketol-acid reductoisomeraseRv0189c ilvD dihydroxy-acid dehydrataseRv2210c ilvE branched-chain-amino-acid
transaminaseRv1820 ilvG acetolactate synthase IIRv3002c ilvN acetolactate synthase I small sub-
unitRv3509c ilvX probable acetohydroxyacid syn-
thase I large subunitRv3710 leuA α-isopropyl malate synthaseRv2995c leuB 3-isopropylmalate dehydrogenaseRv2988c leuC 3-isopropylmalate dehydratase
large subunitRv2987c leuD 3-isopropylmalate dehydratase
small subunit
E. Polyamine synthesisRv2601 speE spermidine synthase
F. Purines, pyrimidines, nucleosides and nucleotides1. Purine ribonucleotide biosynthesisRv1389 gmk putative guanylate kinaseRv3396c guaA GMP synthaseRv1843c guaB1 inosine-5'-monophosphate dehy-
drogenaseRv3411c guaB2 inosine-5'-monophosphate dehy-
drogenaseRv3410c guaB3 inosine-5'-monophosphate dehy-
drogenaseRv1017c prsA ribose-phosphate pyrophosphoki-
naseRv0357c purA adenylosuccinate synthaseRv0777 purB adenylosuccinate lyaseRv0780 purC phosphoribosylaminoimidazole-
succinocarboxamide synthaseRv0772 purD phosphoribosylamine-glycine lig-
aseRv3275c purE phosphoribosylaminoimidazole
carboxylaseRv0808 purF amidophosphoribosyltransferase- Rv0957 purH phosphoribosylaminoimidazole-
carboxamide formyltransferase Rv3276c purK phosphoribosylaminoimidazole
carboxylase ATPase subunitRv0803 purL phosphoribosylformylglycin-
amidine synthase IIRv0809 purM 5'-phosphoribosyl-5-aminoimida-
zole synthaseRv0956 purN phosphoribosylglycinamide
formyltransferase IRv0788 purQ phosphoribosylformylglycin-
amidine synthase IRv0389 purT phosphoribosylglycinamide
formyltransferase IIRv2964 purU formyltetrahydrofolate deformy-
lase
2. Pyrimidine ribonucleotide biosynthesisRv1383 carA carbamoyl-phosphate synthase
subunitRv1384 carB carbamoyl-phosphate synthase
subunitRv1380 pyrB aspartate carbamoyltransferase Rv1381 pyrC dihydroorotaseRv2139 pyrD dihydroorotate dehydrogenaseRv1385 pyrF orotidine 5'-phosphate decarboxy-
laseRv1699 pyrG CTP synthaseRv2883c pyrH uridylate kinaseRv0382c umpA probable uridine 5'-monophos-
phate synthase
3. 2'-deoxyribonucleotide metabolismRv0321 dcd deoxycytidine triphosphate
deaminaseRv2697c dut deoxyuridine triphosphataseRv0233 nrdB ribonucleoside-diphosphate
reductase B2 (eukaryotic-like) Rv3051c nrdE ribonucleoside diphosphate
reductase α chainRv1981c nrdF ribonucleotide reductase small
subunitRv3048c nrdG ribonucleoside-diphosphate small
subunitRv3053c nrdH glutaredoxin electron transport
component of NrdEF systemRv3052c nrdI NrdI/YgaO/YmaA familyRv3247c tmk thymidylate kinaseRv2764c thyA thymidylate synthaseRv0570 nrdZ ribonucleotide reductase, class IIRv3752c - probable cytidine/deoxycytidylate
deaminase
4. Salvage of nucleosides and nucleotidesRv3313c add probable adenosine deaminaseRv2584c apt adenine phosphoribosyltrans-
ferasesRv3315c cdd probable cytidine deaminaseRv3314c deoA thymidine phosphorylaseRv0478 deoC deoxyribose-phosphate aldolaseRv3307 deoD probable purine nucleoside phos-
phorylaseRv3624c hpt probable hypoxanthine-guanine
phosphoribosyltransferaseRv3393 iunH probable inosine-uridine
preferring nucleoside hydrolaseRv0535 pnp phosphorylase from Pnp/MtaP
family 2Rv3309c upp uracil phophoribosyltransferase
5. Miscellaneous nucleoside/nucleotide reactionsRv0733 adk probable adenylate kinaseRv2364c bex GTP-binding protein of Era/ThdF
familyRv1712 cmk cytidylate kinaseRv2344c dgt probable deoxyguanosine
triphosphate hydrolaseRv2404c lepA GTP-binding protein LepARv2727c miaA tRNA δ(2)-isopentenylpyrophos-
phate transferaseRv2445c ndkA nucleoside diphosphate kinaseRv2440c obg Obg GTP-binding proteinRv2583c relA (p)ppGpp synthase I
G. Biosynthesis of cofactors, prosthetic groups and carriers1. BiotinRv1568 bioA adenosylmethionine-8-amino-7-
oxononanoate aminotransferaseRv1589 bioB biotin synthaseRv1570 bioD dethiobiotin synthaseRv1569 bioF 8-amino-7-oxononanoate
synthaseRv0032 bioF2 C-terminal similar to B. subtilis
BioFRv3279c birA biotin apo-protein ligaseRv1442 bisC biotin sulfoxide reductaseRv0089 - possible bioC biotin synthesis
gene
2. Folic acidRv2763c dfrA dihydrofolate reductaseRv2447c folC folylpolyglutamate synthaseRv3356c folD methylenetetrahydrofolate dehy-
drogenaseRv3609c folE GTP cyclohydrolase I Rv3606c folK 7,8-dihydro-6-hydroxymethylpterin
pyrophosphokinaseRv3608c folP dihydropteroate synthaseRv1207 folP2 dihydropteroate synthaseRv3607c folX may be involved in folate biosyn-
thesisRv0013 pabA p-aminobenzoate synthase gluta-
mine amidotransferaseRv1005c pabB p-aminobenzoate synthaseRv0812 pabC aminodeoxychorismate lyase
3. LipoateRv2218 lipA lipoate biosynthesis protein ARv2217 lipB lipoate biosynthesis protein B
4. MolybdopterinRv3109 moaA molybdenum cofactor biosynthe-
sis, protein ARv0869c moaA2 molybdenum cofactor biosynthe-
sis, protein ARv0438c moaA3 molybdenum cofactor biosynthe-
sis, protein ARv3110 moaB molybdenum cofactor biosynthe-
sis, protein BRv0984 moaB2 molybdenum cofactor biosynthe-
sis, protein BRv3111 moaC molybdenum cofactor biosynthe-
sis, protein CRv0864 moaC2 molybdenum cofactor biosynthe-
sis, protein CRv3324c moaC3 molybdenum cofactor biosynthe-
sis, protein CRv3112 moaD molybdopterin converting factor
subunit 1Rv0868c moaD2 molybdopterin converting factor
subunit 1Rv3119 moaE molybdopterin-converting factor
subunit 2Rv0866 moaE2 molybdopterin-converting factor
subunit 2Rv3322c moaE3 molybdopterin-converting factor
subunit 2Rv0994 moeA molybdopterin biosynthesisRv3116 moeB molybdopterin biosynthesisRv2338c moeW molybdopterin biosynthesisRv1681 moeX weak similarity to E. coli MoaARv1355c moeY weak similarity to E. coli MoeBRv3206c moeZ probably involved in
molybdopterin biosynthesisRv0865 mog molybdopterin biosynthesis
5. PantothenateRv1092c coaA pantothenate kinaseRv2225 panB 3-methyl-2-oxobutanoate
hydroxymethyltransferaseRv3602c panC pantoate-β-alanine ligaseRv3601c panD aspartate 1-decarboxylase
6. PyridoxineRv2607 pdxH pyridoxamine 5'-phosphate
oxidase
7. Pyridine nucleotideRv1594 nadA quinolinate synthaseRv1595 nadB L-aspartate oxidaseRv1596 nadC nicotinate-nucleotide pyrophos-
phataseRv0423c thiC thiamine synthesis, pyrimidine
moiety
8. ThiamineRv0422c thiD phosphomethylpyrimidine kinaseRv0414c thiE thiamine synthesis, thiazole
moietyRv0417 thiG thiamine synthesis, thiazole
moietyRv2977c thiL probable thiamine-monophos-
phate kinase
9. RiboflavinRv1940 ribA GTP cyclohydrolase IIRv1415 ribA2 probable GTP cyclohydrolase IIRv1412 ribC riboflavin synthase α chainRv2671 ribD probable riboflavin deaminaseRv2786c ribF riboflavin kinaseRv1409 ribG riboflavin biosynthesisRv1416 ribH riboflavin synthase β chainRv3300c - probable deaminase, riboflavin
synthesis
10. Thioredoxin, glutaredoxin and mycothiolRv0773c ggtA putative γ-glutamyl transpeptidaseRv2394 ggtB γ -glutamyltranspeptidase
precursorRv2855 gorA glutathione reductase homologueRv0816c thiX equivalent to M. leprae ThiXRv1470 trxA thioredoxinRv1471 trxB thioredoxin reductaseRv3913 trxB2 thioredoxin reductaseRv3914 trxC thioredoxin
11. Menaquinone, PQQ, ubiquinone and other terpenoidsRv2682c dxs 1-deoxy-D-xylulose 5-phosphate
synthaseRv0562 grcC1 heptaprenyl diphosphate
synthase IIRv0989c grcC2 heptaprenyl diphosphate
synthase IIRv3398c idsA geranylgeranyl pyrophosphate
synthaseRv2173 idsA2 geranylgeranyl pyrophosphate
synthaseRv3383c idsB transfergeranyl, similar geranyl
pyrophosphate synthaseRv0534c menA 4-dihydroxy-2-naphthoate
octaprenyltransferase Rv0548c menB naphthoate synthaseRv0553 menC o-succinylbenzoate-CoA synthaseRv0555 menD 2-succinyl-6-hydroxy-2,4-cyclo-
hexadiene-1-carboxylate synthaseRv0542c menE o-succinylbenzoic acid-CoA ligase Rv3853 menG S-adenosylmethionine:
2-demethylmenaquinone Rv3397c phyA phytoene synthaseRv0693 pqqE coenzyme PQQ synthesis
protein ERv0558 ubiE ubiquinone/menaquinone biosyn-
thesis methyltransferase
12. Heme and porphyrinRv0509 hemA glutamyl-tRNA reductaseRv0512 hemB δ-aminolevulinic acid dehydrataseRv0510 hemC porphobilinogen deaminaseRv2678c hemE uroporphyrinogen decarboxylase
Nature © Macmillan Publishers Ltd 1998
8
Rv1300 hemK protoporphyrinogen oxidaseRv0524 hemL glutamate-1-semialdehyde amino-
transferaseRv2388c hemN oxygen-independent copropor-
phyrinogen III oxidaseRv2677c hemY' protoporphyrinogen oxidaseRv1485 hemZ ferrochelatase
13. CobalaminRv2849c cobA cob(I)alamin adenosyltransferaseRv2848c cobB cobyrinic acid a,c-diamide
synthaseRv2231c cobC aminotransferaseRv2236c cobD cobinamide synthaseRv2064 cobG percorrin reductaseRv2065 cobH precorrin isomeraseRv2066 cobI CobI-CobJ fusion proteinRv2070c cobK precorrin reductaseRv2072c cobL probable methyltransferaseRv2071c cobM precorrin-3 methylaseRv2062c cobN cobalt insertion Rv2208 cobS cobalamin (5'-phosphate)
synthaseRv2207 cobT nicotinate-nucleotide-dimethyl-
benzimidazole transferase Rv0254c cobU cobinamide kinaseRv0255c cobQ cobyric acid synthaseRv3713 cobQ2 possible cobyric acid synthaseRv0306 - similar to BluB cobalamin synthe-
sis protein R. capsulatus
14. Iron utilizationRv1876 bfrA bacterioferritinRv3841 bfrB bacterioferritinRv3215 entC probable isochorismate synthaseRv3214 entD weak similarity to many phospho-
glycerate mutasesRv2895c viuB similar to proteins involved in
vibriobactin uptakeRv3525c - similar to ferripyochelin binding
protein
H. Lipid biosynthesis1. Synthesis of fatty and mycolic acidsRv3285 accA3 acetyl/propionyl CoA carboxylase
α subunitRv0904c accD3 acetyl/propionyl CoA carboxylase
β subunitRv3799c accD4 acetyl/propionyl CoA carboxylase
β subunitRv3280 accD5 acetyl/propionyl CoA carboxylase
β subunitRv2247 accD6 acetyl/propionyl CoA carboxylase
β subunitRv2244 acpM acyl carrier protein (meromycolate
extension)Rv2523c acpS CoA:apo-[ACP] pantethienephos-
photransferaseRv2243 fabD malonyl CoA-[ACP] transacylaseRv0649 fabD2 malonyl CoA-[ACP] transacylaseRv1483 fabG1 3-oxoacyl-[ACP] reductase (aka
MabA)Rv1350 fabG2 3-oxoacyl-[ACP] ReductaseRv2002 fabG3 3-oxoacyl-[ACP] reductaseRv0242c fabG4 3-oxoacyl-[ACP] reductaseRv2766c fabG5 3-oxoacyl-[ACP] reductaseRv0533c fabH β-ketoacyl-ACP synthase IIIRv2524c fas fatty acid synthaseRv1484 inhA enoyl-[ACP] reductaseRv2245 kasA β-ketoacyl-ACP synthase
(meromycolate extension)Rv2246 kasB β-ketoacyl-ACP synthase
(meromycolate extension)Rv1618 tesB1 thioesterase IIRv2605c tesB2 thioesterase IIRv0033 - possible acyl carrier proteinRv1344 - possible acyl carrier proteinRv1722 - possible biotin carboxylaseRv3221c - resembles biotin carboxyl carrierRv3472 - possible acyl carrier protein
2. Modification of fatty and mycolic acidsRv3391 acrA1 fatty acyl-CoA reductaseRv3392c cmaA1 cyclopropane mycolic acid
synthase 1Rv0503c cmaA2 cyclopropane mycolic acid syn-
thase 2Rv0824c desA1 acyl-[ACP] desaturase Rv1094 desA2 acyl-[ACP] desaturaseRv3229c desA3 acyl-[ACP] desaturaseRv0645c mmaA1 methoxymycolic acid synthase 1Rv0644c mmaA2 methoxymycolic acid synthase 2Rv0643c mmaA3 methoxymycolic acid synthase 3Rv0642c mmaA4 methoxymycolic acid synthase 4Rv0447c ufaA1 unknown fatty acid methyltrans-
feraseRv3538 ufaA2 unknown fatty acid methyltrans-
feraseRv0469 umaA1 unknown mycolic acid methyl-
transferase
Rv0470c umaA2 unknown mycolic acid methyl-transferase
3. Acyltransferases, mycoloyltransferases and phospholipid synthesisRv2289 cdh CDP-diacylglycerol phosphatidyl-
hydrolaseRv2881c cdsA phosphatidate cytidylyltransferase Rv3804c fbpA antigen 85A, mycolyltransferaseRv1886c fbpB antigen 85B, mycolyltransferaseRv3803c fbpC1 antigen 85C, mycolyltransferaseRv0129c fbpC2 antigen 85C', mycolytransferaseRv0564c gpdA1 glycerol-3-phosphate dehydroge-
naseRv2982c gpdA2 glycerol-3-phosphate dehydroge-
naseRv2612c pgsA CDP-diacylglycerol-glycerol-3-
phosphate phosphatidyltrans-ferase
Rv1822 pgsA2 CDP-diacylglycerol-glycerol-3-phosphate phosphatidyltrans-ferase
Rv2746c pgsA3 CDP-diacylglycerol-glycerol-3-phosphate phosphatidyltrans-ferase
Rv1551 plsB1 glycerol-3-phosphate acyltrans-ferase
Rv2482c plsB2 glycerol-3-phosphate acyltrans-ferase
Rv0437c psd putative phosphatidylserine decarboxylase
Rv0436c pssA CDP-diacylglycerol-serine o-phosphatidyltransferase
Rv0045c - possible dihydrolipoamide acetyl-transferase
Rv0914c - lipid transfer proteinRv1543 - probable fatty-acyl CoA reductaseRv1627c - lipid carrier proteinRv1814 - possible C-5 sterol desaturaseRv1867 - similar to acetyl CoA
synthase/lipid carriersRv2261c - apolipoprotein N-acyltrans-
ferase-aRv2262c - apolipoprotein N-acyltrans-
ferase-bRv3523 - lipid carrier proteinRv3720 - C-term similar to cyclopropane
fatty acid synthases
I. Polyketide and non-ribosomal peptide synthesisRv2940c mas mycocerosic acid synthaseRv2384 mbtA mycobactin/exochelin synthesis
(salicylate-AMP ligase)Rv2383c mbtB mycobactin/exochelin synthesis
(serine/threonine ligation)Rv2382c mbtC mycobactin/exochelin synthesisRv2381c mbtD mycobactin/exochelin synthesis
(polyketide synthase)Rv2380c mbtE mycobactin/exochelin synthesis
(lysine ligation)Rv2379c mbtF mycobactin/exochelin synthesis
(lysine ligation)Rv2378c mbtG mycobactin/exochelin synthesis
(lysine hydroxylase)Rv2377c mbtH mycobactin/exochelin synthesis Rv0101 nrp unknown non-ribosomal peptide
synthaseRv1153c omt PKS o-methyltransferaseRv3824c papA1 PKS-associated protein, unknown
function Rv3820c papA2 PKS-associated protein, unknown
functionRv1182 papA3 PKS-associated protein, unknown
functionRv1528c papA4 PKS-associated protein, unknown
functionRv2939 papA5 PKS-associated protein, unknown
functionRv2946c pks1 polyketide synthaseRv1660 pks10 polyketide synthase (chalcone
synthase-like)Rv1665 pks11 polyketide synthase (chalcone
synthase-like)Rv2048c pks12 polyketide synthase (erythronolide
synthase-like)Rv3800c pks13 polyketide synthaseRv1342c pks14 polyketide synthase (chalcone
synthase-like)Rv2947c pks15 polyketide synthase Rv1013 pks16 polyketide synthase Rv1663 pks17 polyketide synthase Rv1372 pks18 polyketide synthase Rv3825c pks2 polyketide synthaseRv1180 pks3 polyketide synthaseRv1181 pks4 polyketide synthaseRv1527c pks5 polyketide synthaseRv0405 pks6 polyketide synthaseRv1661 pks7 polyketide synthaseRv1662 pks8 polyketide synthaseRv1664 pks9 polyketide synthase
Rv2931 ppsA phenolpthiocerol synthesis (pksB)Rv2932 ppsB phenolpthiocerol synthesis (pksC)Rv2933 ppsC phenolpthiocerol synthesis (pksD)Rv2934 ppsD phenolpthiocerol synthesis (pksE)Rv2935 ppsE phenolpthiocerol synthesis (pksF)Rv2928 tesA thioesteraseRv1544 - probable ketoacyl reductase
J. Broad regulatory functions1. Repressors/activatorsRv1657 argR arginine repressorRv1267c embR regulator of embAB genes
(AfsR/DndI/RedD family)Rv1909c furA ferric uptake regulatory proteinRv2359 furB ferric uptake regulatory proteinRv2919c glnB nitrogen regulatory proteinRv2711 ideR iron dependent repressor, IdeRRv2720 lexA LexA, SOS repressor proteinRv1479 moxR transcriptional regulator, MoxR
homologueRv3692 moxR2 transcriptional regulator, MoxR
homologueRv3164c moxR3 transcriptional regulator, MoxR
homologueRv0212c nadR similar to E.coli NadRRv0117 oxyS transcriptional regulator (LysR
family)Rv1379 pyrR regulatory protein pyrimidine
biosynthesisRv2788 sirR iron-dependent transcriptional
repressorRv3082c virS putative virulence regulating
protein (AraC/XylS family)Rv3219 whiB1 WhiB transcriptional activator
homologueRv3260c whiB2 WhiB transcriptional activator
homologueRv3416 whiB3 WhiB transcriptional activator
homologueRv3681c whiB4 WhiB transcriptional activator
homologueRv0023 - putative transcriptional regulatorRv0043c - transcriptional regulator (GntR
family)Rv0067c - transcriptional regulator
(TetR/AcrR family)Rv0078 - transcriptional regulator
(TetR/AcrR family)Rv0081 - transcriptional regulator (ArsR
family)Rv0135c - putative transcriptional regulatorRv0144 - putative transcriptional regulatorRv0158 - transcriptional regulator
(TetR/AcrR family)Rv0165c - transcriptional regulator (GntR
family)Rv0195 - transcriptional regulator
(LuxR/UhpA family)Rv0196 - transcriptional regulator
(TetR/AcrR family)Rv0232 - transcriptional regulator
(TetR/AcrR family)Rv0238 - transcriptional regulator
(TetR/AcrR family)Rv0273c - putative transcriptional regulatorRv0302 - transcriptional regulator
(TetR/AcrR family)Rv0324 - putative transcriptional regulatorRv0328 - transcriptional regulator
(TetR/AcrR family)Rv0348 - putative transcriptional regulatorRv0377 - transcriptional regulator (LysR
family)Rv0386 - transcriptional regulator
(LuxR/UhpA family)Rv0452 - putative transcriptional regulatorRv0465c - transcriptional regulator
(PbsX/Xre family)Rv0472c - transcriptional regulator
(TetR/AcrR family)Rv0474 - transcriptional regulator
(PbsX/Xre family)Rv0485 - transcriptional regulator (ROK
family)Rv0494 - transcriptional regulator (GntR
family)Rv0552 - putative transcriptional regulatorRv0576 - putative transcriptional regulatorRv0586 - transcriptional regulator (GntR
family)Rv0650 - transcriptional regulator (ROK
family)Rv0653c - putative transcriptional regulatorRv0681 - transcriptional regulator
(TetR/AcrR family)Rv0691c - transcriptional regulator
(TetR/AcrR family)Rv0737 - putative transcriptional regulatorRv0744c - putative transcriptional regulatorRv0792c - transcriptional regulator (GntR
Nature © Macmillan Publishers Ltd 1998
8
0250
1
150,
001
300,
001
450,
001
600,
001
750,
001
900,
001
1,05
0,00
1
1,20
0,00
1
1,35
0,00
1
1,50
0,00
1
1,65
0,00
1
1,80
0,00
1
1,95
0,00
1
2,10
0,00
1
dnaA
dnaN
recF
0004
gyrB
gyrA
0007
ileT alaT 00
08pp
iA 0010
0011
0012
pabA
pknB
pknA
pbpA
rodA
ppp
0019
0020
leuT 00
2100
22
0023
0024
0025
0026
0027
0028
0029
0030
0031
bioF2
0033
0034
fadD34
0036
0037
0038 00
3900
40
leuS
0042
0043
0044
0045
0046
0047
0048
0049
ponA
0051
0052
rpsF
ssb rpsR
rplI
0057
dnaB
0059
0060
0061
celA
0063
0064
0065
icd2
0067
0068
sdaA
glyA2
0071
REP'00
72
0073
0074
0075
00
PE_PGRS pe
pA
0126
0127
0128
fbpC2
0130
fadE1
0132
0133
ephF
0135
0136
0137
0138
0139
0140 01
41
0142
0143
0144
0145
0146
0147
0148
0149 01
50
PE
PE
0153
fadE2
pntA
A pntA
Bpn
tB
0158
PE
PE
0161
adhE
0163
0164 01
65
fadD5
0167
0168
mce1
0170
0171
0172
lprK
0174
0175
0176
0177
0178
lprO
0180
0181
sigG
0183
0184
0185
bglS
0187
0188
ilvD
0190
0191
0192
0193
0194
0195
0196
0197
0
0249
hsp
nirB
nirD co
bU
cobQ
PPE
0258
0259
0260
narK
3
0262
0263
0264
fecB2
0266
narU
0268
0269
fadD2
fadE6
0272
0273
0274 fad
D27
0276
0277
PE_PGRS
PE_PGRS
PPE
0281
0282
0283
0284
PE
PPE
0287
0288
0289
0290
0291
0292
0293
0294
0295
atsG
PE_PGRS 02
98 0299 0300 0301
0302
0303
PPE
PPE
0306 03
0703
0803
09 0310
0311
0312
0313 03
14
0315
0316 gl
0373
0374
0375
0376
0377
0378
sec 03
8003
81um
pA03
83
clpB
0385
0386
0387
PPE
purT
0390
metZ
0392
0393
0394
0395
0396
0397
0398
lpqK
fadE7
0401
mmpL1
mmpS1
fadD30
pks6
0406
0407
pta
ackA
pknG
glnH
0412
mutT3 thi
E
0415
0416
thiG
lpqL
lpqM
0420
0421thi
D
thiC
0424
ctpH
0426
xthA
0428
def
0430
0431
sodC
0433
0434
0435
pssA
psd
moaA3
0439
groE
L204
41
PPE
0443
0444
sigK
0446
0508
hemA
hemC
cysG
hemB
0513
0514
0515
0516
0517
0518
0519
0520 05
21
gabP
0523
hemL
0525
0526
ccsA
0528
ccsB
0530
0531
PE_PGRS fab
H
menA
pnp
galE
2
0537
0538
0539
0540
0541
menE
0543
0544
pitA
0546
0547
menB
0549
0550
fadD8
0552
menC
bpoC
menD
0556
0557
ubiE 05
5905
60
0561
grcC
1htp
Xgp
dA1
0565
0566
tyrT
0567
0568
0569
nrdZ
0571
0572
0573
0574
0575
0576
0577
PE_PGRS
0579
0580
0581
0582 lpq
N
0584
0585
0586
0654
0655
0656
0657
0658
0659 0660 0661 0662
atsD
0664 0665 0666
rpoB
rpoC
0669
end
lpqP
fadE8
echA
406
74ec
hA5
mmpL5
mmpS5
0678 06
7906
80
0681
rpsL
rpsG
fusA
tuf
0686
0687
0688
0689
0690
0691
0692
pqqE
lldD1
0695
0696
0697
0698
0699
rpsJ
rplC
rplD
rplW
rplB
rpsS
rplV
rpsC
rplP rpmC rpsQ
atsA
0712
0713
rplN rplX
rplE rpsN
rpsH
rplF
rplR
rpsE
rpmD rplO
sppA
0725
0726
fucA
0728
xylB
0730
0731
secY
adk
map'
sigL
0736
0737
0738
0739
0740 IS
1557
'0741
PE_PGRS
0743
0744
cpsY
0807
purF
purM
0810
0811
pabC
0813 ss
eC2 cy
sA2
thiX
0817
0818
0819
phoT ph
oY2
0822
0823
desA
1
0825
0826 08
2708
28 IS16
05'
0829
0830
0831
lysTglu
T aspT pheT
PE_PGRS
PE_PGRS
PE_PGRS
lpqQ
0836
0837
lpqR
0839
0840
0841
0842
0843
narL
0845
0846
lpqS
cysM
3
0849 IS
1606
'0850 08
51
fadD16
pdc
0854
far
0856
0857
0858
fadA
fadB
0861
0862
0863 moa
C2 mogmoa
E2 0867
moaD2 moaA208
70cs
pB PE_PGRS
fadE10
0874
0875
0876
0877
PPE
0879
0880
0881
0882 08
83
serC
0885
fprB
0940
0941
0942
0943
0944
0945
pgi
0947
0948
uvrD
0950
sucC
sucD
0953
0954
0955
purN
purH
0958
0959
0960
0961 09
62
0963
0964
0965
0966
0967
0968
ctpV
0970 ec
hA7
fadE12
accA
2
accD
2
fadE13
0976
PE_PGRS
PE_PGRS
0979 PE_P
GRS
0981
0982
0983
moaB2
mscL
0986
0987
0988
grcC
2
0990
0991
0992
galU
moeA
rimJ
0996
alaV
0997
0998
0999
1000
arcA
1002
1003
1004
pabB
1006
metS
1008
1009
ksgA
1011
1012
pks1
6
pthrpl
Y
lpqT
lipU
cysM
2
pra
metBgr
eA10
81
1082
1083
1084
1085
1086
PE_PGRS
PEPE10
90
PE_PGRS
coaA
glyA
desA
2
phoH
2
1096
1097
fum
1099
1100
1101
1102
1103
1104
1105
1106
xseB
xseA
1109
lytB'
1111
1112
1113
1114
1115
1116
1117 11
1811
1911
20
zwf
gnd2
bpoB
ephC
1125
1126
ppdK
1128REP
1129
1130
gltA1
1132
metE
1134
PPE
1136 11
3711
3811
39
1140 ec
hA11
echA
10
mcr
1144
1145
1146
1147
REP
1148
IS-L
IKE
1149
1150
1151
1152
omt
1154
1155
1156
1157
1158
fadD6
folP2
1208
1209
tagA
1211 12
12
glgC
PE
1215
1216
1217
1218
1219
1220
sigE
1222
htrA
1224 12
25
1226
1227
lpqX
mrp
1230
1231
1232
1233
1234
lpqY
sugA
sugB
sugC
corA
mdh12
4112
42 PE_PGRS
lpqZ
1245
1246
1247
sucA
1249
1250
1251
lprE
deaD
1254
1255
1256
1257
1258
1259
1260
1261
1262
amiB2
1264
1265
pknH
embR
1268
1269
lprA
1271
1272
1273
lprB
lprC 12
76
1277
1278
1279
oppA
oppD
1331
1332
1333
1334
1335
cysM
1337
murI13
39rp
hA13
41pk
s14 1343
1344
fadD33
fadE14 13
47leu
T
1348
1349
fabG2 1351
1352 13
53
1354
moeY
1356
1357
1358
1359
1360
PPE
1362
1363
rsbU
1365
1366
1367
lprF
IS61
10
1369
1370
1371
pks1
8
1373
1374
1375
1376
1377
1378
pyrR
pyrB
pyrC
1382
carA
carB
pyrF
PE
PPE
mIHF
gmk
1390
dfp
metK
1393
1394
1395
PE_PGRS
1397
1398lip
H
lipI
1401
priA
1403
1404
1405
fmt
fmu
rpe
1462
1463
1464
1465
1466
fadE15
PE_PGRS
ctpD
trxA trxB
echA
12
1473
1474
acn
1476
1477
1478
moxR
1480
1481
1482
fabG1
inhA
hemZ
1486
1487
1488
1489
1490
1491
mutA
mutB
1494 1495
1496
lipL
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
gmdA
epiA
1513
1514
1515
1516
1517
1518
1519
1520
fadD25
mmpL12
1523
1524
wbbl2
1526
pks5
papA
4
fadD24
adh
1531 15
32
1533
15
hisD
hisC
hisB
hisH
hisA
impA
hisF
hisI2
chaA
bcpB
trpE
1610
trpC
trpB
trpA
lgt
1615
1616
pykA
tesB1
1619
cydC
cydD
cydB
appC
1624
1625
leuV
1626
1627
1628
polA
rpsA
1631
1632
uvrB
1634
1635
1636 16
37
uvrA
1639
lysX
infC rpm
I rplT
tsnR
1645
PE
1647
1648
pheS
pheT
PE_PGRS
argC
argJ
argB
argD
argF
argR
argG
argH
pks1
0
pks7
pks8
1724
1725
1726
1727
1728
1729
1730
gabD
117
3217
3317
3417
35
narX
narK
2
1738
1739
1740 1741
1742
pknE
1744
1745
pknF
1747
1748 17
49
fadD1
1751
1752
PPE
1754
plcD
IS61
10
1756
1757
1758
PE_PGRS
(wag
22)
1760
1761
1762
IS61
10
1763
1764
1765
ISB9'
1766
1767
PE_PGRS
1769
1770
1771
1772 17
73
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
PPEPE
PPE
PPE
PE17
92 1793
1794
1795
1796
ureF ureG
ureD
ndh
1855
1856
modA
modB
modC
modD
1861
adhA
1863
1864
1865
1866
1867
1868
1869
1870
1871
lldD2
1873
1874
1875
bfrA
1877
glnA3
1879
1880
lppE
1882
1883
1884
1885
fbpB
1887
1888
1889
1890
1891 1892 1893 18
94
1895
1896
1897
1898 lpp
D
lipJ
cinA
nanT
1903
1904
aao
1906
1907
katG
furA
1910
lppC
fadB5
1913 19
14
aceA
a
aceA
b
PPE
PPE
1919
1920
lppF
1922
lipD
1924
fadD31
1926
1927
1928
1929
1930
1931
tpxfad
E18fa
d
0257
150,
000
300,
000
450,
000
600,
000
750,
000
900,
000
1,05
0,00
0
1,20
0,00
0
1,35
0,00
0
1,50
0,00
0
1,65
0,00
0
1,80
0,00
0
1,95
0,00
0
2,10
0,00
0
2,25
0,00
0
0076
0077
0078
0079
0080
0081
0082
0083
hycD
hycP
hycQ
hycE
0088
0089
0090
0091
ctpA
0093
REP
0094
0095
PPE
0097
0098
fadD10
0100
nrp
0102
ctpB
0104
rpmB
0106
ctpI
0108
PE_PGRS
0110
0111
gca
gmhA
0114
0115
0116
oxyS
oxcA
fadD7
fusA2
0121
0122
0123
0198
0199
0200 02
01
mmpL11
0203
0204
0205
mmpL3
0207
0208
0209
0210
pckA
nadR
0213
fadD4
fadE3
0216
lipW
0218
0219
lipC
0221
echA
102
23
0224
0225
0226
0227
0228
0229
0230
fadE4
0232
nrdB
gabD
2
0235
0236
lpqI
0239
0240 02
41
0245
fabG4
fadA2
fadE5
0246
0247
0248
16 glpQ2 glyU 0318
3
pcp
0320
dcd
udgA
0323
0324
0325
0326
0327
0328 03
2903
30
0331
0332
0333
rmlA
PE
0336
aspC
0338
0339
0340
0341
0342
0343
lpqJ
0345
aroP
2
0347
0348
0349
dnaK
grpE
dnaJ
hspR
PPE
PPE
0356
purA
0358
0359 03
60
0361
mgtE
fba
0364
0365
0366
0367
0368
0369
0370
0371
0372
ufaA1
0448
0449
mmpL4
mmpS4
0452
PPE
0454 04
55ec
hA2
0457
0458
0459
0460
0461
0462
0463 04
64
0465
0466
aceA
fadB2
umaA
1 umaA
2
0471
0472
0473
0474
0475
0476
0477
deoC
0479
0480
0481
murB
0483
0484
0485
0486
0487
0488
gpm
senX
3
regX
3
0492
0493
0494
0495
0496
0497
0498
0499
proC
galE
1
0502
cmaA
205
04
serB
mmpS2
mmpL2
8605
8705
88
mce2
0590
0591
0592
lprL
0594
0595
0596
0597
0598
0599
0600
0601
tcrA
0603
lpqO
IS15
36
0605
0606
0607 0608
0609
0610
0611
0612
0613
0614
0615
0616
0617
galT' galT
galK
0621
0622
0623
0624 06
2506
2606
2706
28
recD
recB
recC
echA
306
33
0634
thrT metT
0635 0636
0637
trpT secE
nusG
rplK
rplA mmaA
4mmaA
3mmaA
2mmaA
1
lipG
0647
0648
fadD37
0650
rplJ
rplL 06
53
0745
PE_PGRS
PE_PGRS
0748
0749
0750 mmsB
fadE9
mmsA
PE_PGRS
PPE
thrV
0756
phoP
phoR
0759
0760
adhB
0762 0763
0764
0765
0766
0767
aldA
0769
0770
0771
purD
ggtA
0774
0775 07
76
purB
0778
0779
purC
ptrBb
ptrBa
0783
0784
0785
0786
0787
purQ 07
8907
9007
91
0792
0793
lpdB
IS61
10
0795
0796
IS15
470797
0798
0799
pepC
0801 08
02
purL
0804
0805
B
0887
0888
citA
0890
0891
0892
0893
0894
0895
gltA2
0897
0898
ompA
0900
0901
0902
0903
accD
3
echA
609
06
0907
ctpE
0909 0910
0911
0912
0913
0914
PPE
PE
betP
0918
0919 ar
gT
IS15
54
0920
IS15
35
0921
0922
0923
nram
p
0925
0926
0927
phoS
2ps
tC2
pstA
1
pknD
pstS
pstB
phoS
1
pstC
pstA
2 0937
0938
0939
pqT
prsA
glmU
glnT
1019
mfd
1021
lpqU
eno
1024
1025
1026
kdpE
kdpD
kdpA
kdpB
kdpC
1032
1033
1034
IS15
60
1035
1036
1037 1038
PPE
PE
IS-L
IKE
1041
1042
1043
1044
1045
1046
IS10
811047
1048
1049
1050
1051
1052
1053
1054 1055 leuX 1056
1057
fadD14
1059
1060
1061
1062
1063
lpqV
1065
1066 PE_P
GRS
PE_PGRS
1069
echA
8ec
hA9
1072
1073
fadA3
1075
1159
mutT2
narG
narH
narJ
narI
1165
lpqW
1167
PPE
PE
1170
1171
PE
1173
1174
fadH
1176
fdxC
1178
1179
pks3
pks4
papA
3
mmpL10
1184
fadD21
1186
rocA
1188
sigI
1190
1191
1192
fadD36
1194
PE
PPE
1197
1198
1199IS
1081
1200
1201
dapE
1203
1204
1205
oppC
oppB
1284
cysD
cysN
1287
1288
1289
1290
1291
argV
argS
lysA
thrA
thrC
thrB
rho
rpmE
prfA
hemK
1301
rfe
1303
atpB atpE atpF
atpH
atpA
atpG
atpD
atpC 1312
IS15
57
1313
1314
murA
rrs
rrl
rrfog
t
alkA
1318
1319
1320
1321
1322
fadA4
1324
PE_PGRS
glgB
1327
glgP
dinG
1330
e
ribG
1410
lprG
ribC
1413
1414
ribA2
ribH 1417
lprH
1419
uvrC
1421
1422
1423
1424
1425
lipO
fadD12
1428
1429
PE
1431
1432
1433
1434 14
35
gap
pgk
tpi
1439
secG PE_P
GRS
bisC
1443
1444
devB
opcA
zwf2
tal
tkt
PE_PGRS
ctaB
PE_PGRS
1453
qor
1455
1456
1457
1458
1459
1460
1461
1534
1535
ileS
dinX
ansA
lspA
1540
lprI glbN
1543
1544
1545
1546
dnaE
1
PPE
fadD11
' fadD11
plsB1
frdA
frdB
frdC frdD
1556
mmpL6 15
58ilv
A
1560
1561
glgZ
glgY
glgX
1565
1566
1567
bioA
bioF
bioD
1571 15
7215
7315
7415
75 1576
1577
1578
1579 1580 1581
1582
1583 1584 1585
1586
1587
REP 1588
bioB
1590
1591
1592
1593
nadA
nadB
nadC
1597 15
98
pks1
7
pks9
pks1
116
66
1667
1668
1669 1670 1671
1672
1673
1674
1675
1676
dsbF
1678
fadE16
1680
moeX
1682
1683
1684 16
8516
8616
87
1688
tyrS
lprJ
1691
1692
1693
tlyA
1695
recN
1697
1698
pyrG
1700
1701
1702REP
1703
cycA
PPE
PPE
1707
1708
1709
1710
1711
cmk
1713
1714
fadB3
1716
1717
1718
1719
proT 17
2017
21
1722
1723
1797
1798
lppT
PPE
PPE
PPE
PE_PGRS
1804
1805
PE
PPE
PPE
PPE
1810
mgtC18
12
1813
1814
1815
1816
1817
PE_PGRS
1819
ilvG
secA
2
pgsA
2 1823
1824
1825
gcvH
1827
1828
1829
1830
1831
gcvB
1833
1834
1835
1836
glcB
1838
1839 PE_P
GRS
1841
1842
guaB
1
gnd
1845
1846
1847 ureA
ureB
ureC
adE17
echA
13
1936
1937
ephB
1939
ribA
1941 19
4219
4319
44
1945
REP
lppG
1947 19
4819
4919
50 1951
1952
1953 19
5419
5519
5619
57 1958
1959 1960
1961 19
62
1963
1964
1965
mce3
1967
1968
1969
lprM
1971
1972
1973
1974
1975 19
76
1977
1978
1979
mpt64
nrdF
1982
PE_PGRS 19
84
1985
1986
1987
1988
1989
1990
1991
ctpG
1993
1994
1995
1996
ctpF
1998
1999
2000
2001
fabG3 20
03
0238
Nature © Macmillan Publishers Ltd 1998
8
2,25
0,00
1
2,40
0,00
1
2,55
0,00
1
2,70
0,00
1
2,85
0,00
1
3,00
0,00
1
3,15
0,00
1
3,30
0,00
1
3,45
0,00
1
3,60
0,00
1
3,75
0,00
1
3,90
0,00
1
4,05
0,00
1
4,20
0,00
1
4,35
0,00
14,
411,
529
2004
2005
otsB
fdxA
2008
2009
2010 20
1120
12
IS16
0720
1320
1420
15
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
pfkB
2030
hspX
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
pncA20
44
lipT
lppI
2047
pks1
2
2049
2050
2051
2052
2053
2054
rpsR
2 rpsN
2 rpmG rpmB2
2059
2060
2061
cobN
2063
cobG
cobH
cobI
2067
blaC
sigC co
bKco
bM
cobL
2073
2074
2075
2076
2077
2
2140
dapE
2
leuU
2142
2143
2144 wag31
2146
2147
2148
yfiH
ftsZ
ftsQ
murC
murG
ftsW
murD
murX
murF
murE
2159
2160
2161
PE_PGRS
pbpB
2164
2165
2166
IS61
10
2167
2168
2169
2170
lppM
2172
idsA2
2174
2175
pknL
IS15
58
2177
aroG
2179
2180
2181
2182
2183
2184
2185
2186
fadD15
2188
2189
2190
2191
trpD
ctaE
qcrC
qcrA
qcrB
2197 mmpS3
2199
ctaC
asnB
cbhK
2203 22
04
2205
2206
cobT
cobS
2209
ilvE
gcvT
2212
pepB
ephD
sucB
2216
lipB
IS61
10
2278
2279
2280
pitB
2282
2283
lipM
2285
2286
yjcE
2288
cdh
lppO
sseB 22
9222
93
2294
2295
2296
2297
2298
htpG
2300
2301
2302 23
0323
04
2305
2306
2307
2308
metV23
09
2310
2311
2312
2313
2314
2315
uspA
uspE
uspC
2319
rocE
rocD
2 rocD
123
23
2324
2325
2326
2327
PE
narK
1
lppP
2331
mez
2333
cysK
cysE
2336
2337
moeW
mmpL9
PE
asnT
lppQ
2342
dnaG
dgt
2345
2346
2347
2348
plcC
plcB
plcA
PPE
PPE
lppR
lepA
2405
2406
2407
PE 2409
2410
2411
rpsT 24
13
2414
2415
2416
2417
2418
2419
2420
2421
2422
2423
IS15
58
2424
2425
2426
proA
ahpC
ahpD PPE
PE 2432 2433
2434
2435
rbsK
2437
2438
proB
obg
rpmA rpl
U
dctA
rne
ndkA24
46
folC
valS
2449
2450
2451 24
5224
5324
54
2455
2456
clpX
2458
2459
clpP2clp
P
tigpr
oUgly
Vlip
P
2464rpi
2466
pepD
2468
2469
glbO
2471
2472
2473 24
7424
75
2476
2477
2478
I24
7
2525
2526
2527 mrr
2529
2530
adi
2532
nusBefp
pepQ
2536 ar
oDar
oBar
oK
aroF
2541
2542
lppA
lppB 2545
2546 2547 2548 25
4925
5025
51ar
oE
2553
2554
alaS
2556
2557
2558
2559
2560
2561
2562
2563
glnQ
2565
2566
2567
2568
2569
2570
2571
aspS
2573
2574
2575
2576
2577
2578
linB
hisS
2581
ppiB
relA
apt
2585
secF
secD
2588
gabT
fadD9
PE_PGRS ru
vBru
vAru
vC25
9525
9625
9725
9825
9926
00sp
eE
2602 26
0326
04tes
B2
2606
pdxH
2683
arsA
arsB
2686
2687
2688
2689
2690
trkA
trkB 26
9326
94
2695
2696du
t26
98 2699
2700
suhB
ppgK
sigA
2704
2705
2706
2707 27
0827
09sig
B
ideR
2712
2713
2714
2715
2716 27
1727
1827
19
lexA
2721
2722
2723
fadE20
hflX
dapF
miaA27
28
2729
2730
2731
2732
2733
2734
2735
recX
recA
2738
2739
2740
PE_PGRS 27
42
2743 35
kd_a
g27
45 pgsA
327
47
ftsK
2749
2750
2751
2752
dapA
2754
hsdS
'hs
dM
2757 2758 2759
2760
2761
2762
dfrA
thyA
2765
fabG527
67
PPE
PE
PPE
2771
2772
dapB27
7427
7527
76
2777
2843
2844
proS
efpA
cysG
2
cobB
cobA
2850
2851
2852
PE_PGRS
2854
gorA
nicT
2857
aldC
2859
glnA4
map28
62
2863
2864
2865
2866
2867
gcpE
2869
2870
2871 2872
mpt83
2874
mpt70 28
76 merTmpt5
328
7928
80cd
sA
frrpy
rH
2884
IS15
39
2885
2886
2887
amiC
tsfrp
sB
2891
PPE
2893
xerC
viuB
2896
2897
2898
fdhD
fdhF
2901
rnhB
lepB
rplS
lppW
trmD
rimM
2908 rpsP
2910
dacB
2912
2913
pknI
2915
ffh
2917
glnD
glnB
amt
ftsY
2949
fadD29
2951
2952
2953
2954
2955
2956
2957
2958
2959
2960
2961
2962
2963
purU
kdtB
2966
pca
2968
2969
lipN
2971
2972
recG
2974
2975
ung
thiL
IS15
38
2978
2979
2980
ddlA
gpdA
2
2983
ppk
mutT1 hu
pB
leuD
leuC
2989
2990
2991
gluU glnU
gltS
2993
2994
leuB
serA
2997
2998
lppY
3000
ilvC
ilvN
ilvB
3004 30
05
lppZ
3007
3008
gatB
pfkA
gatA
gatC
3013
ligA
3015
lpqA 30
17
PPE
3019PEPPEPPE
IS10
81
3023
3024
3025
lipR
3085
adhD
3087
3088
fadD13
3090
3091
3092
3093
3094
3095
3096
PE
3098ss
r30
99sm
pBfts
Xfts
E31
0331
04
prfB
fprA
3107
3108
moaA
moaB moaC moaD 31
1331
14IS
108131
15
moeB
cysA
3 sseC moa
E 3120
3121
3122
3123
3124
PPE
3126
3127
3128
3129
3130
3131
3132
3133
3134
PPEPPE
3137
pflA
fadE24
fadE23
fadB4 31
4231
43PPE
nuoA
nuoB
nuoC
nuoD
nuoE
nuoF
nuoG
nuoH
nuoI
nuoJ
nuoK
nuoL
nuoM
nuoN
PPE
3160
3161
3162
3163
moxR331
6531
66
3224
3225
3226
aroA
3228
desA
3
3230
3231
pvdS32
3332
34
3235
kefB
3237
3238
3239
secA
3241
3242
3243
lpqB
mtrB
mtrA
tmk
sahH
3249 rubB rubA32
52
3253
3254
manA
3256
pmmA
3258
3259
whiB2
3261
3262
3263
rmlA2
wbbL
rmlD
3267
3268
3269
ctpC
3271
3272
3273
fadE25pu
rE
purK
3277 32
78bir
A
accD
532
8132
82ss
eA32
84
accA
3
sigF
rsbW
3288
3289
lat32
91
3292
aldB
3294
3295
lhr
nei
lpqC
atsB
IS16
083348 33
49IS15
61'
PPE
3351
3352
3353
3354 33
55fol
D33
5733
5833
59
3360 33
6133
6233
6333
64
3365
spoU
PE_PGRS 33
6833
69
dnaE
2
3371
otsB2
echA
18 echA
18' am
iD
3376
3377
3378
3379
IS61
10
3380
3381lyt
B
idsB
3384
3385
IS15
6033
8633
87
PE_PGRS
3389
lpqD
acrA
1cm
aA1
iunH
3394
3395
guaA
phyA
idsA
3399
3400
3401
3402
3403
3404
3405
3406
3407
3408
choD
guaB
3
guaB
2
3412 34
13sig
D34
15
wh
3481
3482
3483
cpsA
3485
3486
lipF
3488 3489
otsA
3491 34
9234
93
3494
lprN
3496
3497
3498
mce4
3500
3501
3502
fdxD
fadE26
fadE27
fadD17
PE_PGRS
PE_PGRS
ilvX
3510
PE_PGRS
PE_PGRS
fadD18
PE_PGRS
fadD19
echA
1935
1735
18
3519
3520
3521
3522
3523
3524
3525
3526
3527
3528
3529
3530
3531
PPE
PPE
3534
3535
3536
3537
ufaA2
PE
ltp2
3541
3542
fadE29
fadE28
3545
folE
ftsH
3611
3612
3613
3614
3615
3616
ephA
3618
3619
3620
PPE
PE
lpqG
hpt
mesJ
3626
3627
ppa
3629
3630
3631
3632
3633
rmlB2
3635
IS15
34
3636
3637
3638
3639
IS15
53
3640
fic36
4236
43thr
U36
44
3645
topA
3647
cspA
3649
PE
3651
PE_PGRS
PE_PGRS
3654
3655 3656 3657
3658
trbB
3660
3661
3662
dppD
dppC
dppB
dppA
acs
3668
3669
ephE
3671
3672
3673nth
3675
3676
3677
3678
3679
3680
whiB4
ponA
'
3683
3684
proV
3685
3686
3687
3688
3689
3690
3691
moxR2
tyrA
3755
proZ
proW
proV
proX
3760 fad
E36
3762
lpqH
3764
3765
3766
3767
3768
3769
3770
3771 argU serT
hisC2 37
73
echA
21lip
E
3776
serU
3777
3778
3779
3780
3781
rfbE
3783
epiB
3785
3786
3787
3788
3789
3790
3791
3792
embC
embA
embB
atsH
fadE35
3798
IS15
57
accD
4
pks1
3
fadD32
3802
fbpC1
fbpA
3805
3806
3807
3808
glf
pirG
csp
PE_PGRS
3813
3814
3815
3816
3817
3818
3819
papA2
PE
PPE38
74es
at6
3876
3877
3878
3879
3880
3881
3882
3883
3884
3885
3886
3887
3888
3889
3890
3891
PPE
PE
3894
3895
3896
3897
3898
3899
3900
3901
3902
3903
3904
3905
3906
pcnA
3908
3909
3910
sigM
3912
trxB2
trxC
cwlM
3916
parA
parB
gid39
20
3921
3922 rnpA rpmH
2,40
0,00
0
2,55
0,00
0
2,70
0,00
0
2,85
0,00
0
3,00
0,00
0
3,15
0,00
0
3,30
0,00
0
3,45
0,00
0
3,60
0,00
0
3,75
0,00
0
3,90
0,00
0
4,05
0,00
0
4,20
0,00
0
4,35
0,00
0
2078
2079
lppJ 20
81
2082
2083
2084
IS15
56
2085
2086
2087
pknJ
pepE
2090
2091
helY
2093
2094
2095
2096
2097
PE_PGRS
PE
2100
helZ
2102 21
0321
04
IS61
10
2105
2106
PEPPE
prcA
prcB21
11
2112
2113
2114
2115
lppK 2117 21
18
2119
2120
hisGhis
I
PPE
metH
2125
PE_PGRS
ansP
2128 21
29cy
sS2
cysQ
2132
2133
2134
2135
2136
2137
lppL
pyrD
pB
lipA
2219
glnA1
glnE
glnA2
2223
2224
panB
2226
rnpB
2227
2228
2229
2230
cobC
2232
2233
ptpA
2235
cobD
2237 va
lV ahpE
2239
2240
aceE
2242
fabD
acpM
kasA
kasB
accD
6
2248
glpD1
2250
2251
2252
2253 22
5422
5522
5622
57
2258
adhE
222
60 2261
2262
2263
2264
2265
2266
2267
2268
2269
lppN 2271
2272
2273
2274
2275
2276
2277
PE
IS61
10
2354
2355
PPE
glyS
2358
furB 23
6023
6123
62
amiA2
bex
2365
2366
2367
phoH23
6923
70
PE23
72dn
aJ2
hrcA
2375 23
76mbtHmbtG
mbtF
mbtE
mbtD
mbtC
mbtB
mbtA
lipK
trpE2
2387
hemN
2389
2390
nirA
cysH
2393
ggtB
2395
PE_PGRS cy
sA
cysW
cysT
subI
2401
2402
IS61
10
479 2480
2481
plsB2
2483
2484
lipQ
argW
echA
14PE_P
GRS
2488
2489
PE_PGRS
2491
2492
2493
2494 pd
hC
pdhB
pdhA
citE
2499
fadE19
accA
1
accD
1
scoB
scoA
fadD35
2506
2507
2508
2509
2510
2511
hisT
IS10
81
2512
2513
2514
2515
2516
2517
lppS
lysU
PE
2520
bcp
2522
acpS
fas
PPE
2609
2610
2611
pgsA26
13
thrS
PE_PGRS
2616 26
17
2618 26
1926
2026
21
2622
2623
2624
2625
2626
2627
2628
2629
2630
2631
2632
2633
PE_PGRS
2635
2636
dedA
2638 26
3926
4026
4126
42
arsC
2644
valTgly
T cysT valU 2645
2646
2647
IS61
10
2648
2649
2650
2651
2652
2653
2654
2655
2656
2657 2658
2659
2660 266126
62 2663 2664 2665 IS
1081
'2666
clpX'
2668
2669
2670
ribD
2672
2673
2674 26
7526
76
hemY'
hemE
echA
1526
80
2681
dxs
77
2778
2779
ald
2781
pepR
gpsI
lppU
rpsO
ribF
2787
sirR
fadE21
ltp1
IS16
02
2791
2792
truB
2794
2795
lppV
2797
2798
2799
2800
2801
2802
2803
2804
2805
2806
2807
2808
2809
IS15
55'
2810
2811
IS16
042812
2813
IS61
10
2814
2815
2816
2817
2818
2819
2820
2821
2822
2823
2824
2825
2826
2827
2828
2829
2830
echA
16 ugpC
ugpB
ugpE
ugpA
dinF
2837
rbfA
infB
2840
nusA28
42
smc
2923fpg
rnc
2926
2927
tesA
2929
fadD26
ppsA
ppsB
ppsC
ppsD
ppsE
drrA
drrB
drrC
papA
5
mas
fadD28
mmpL7
IS15
33
2943
2944
lppX
pks1
pks1
5
fadD22
3026
3027
fixB
fixA
3030
3031
3032
3033
3034
3035
3036
3037
3038
echA
1730
4030
41
serB
2
ctaD
fecB
adhC 30
4630
47nr
dG
3049
3050
nrdE
nrdI nr
dH
3054
3055
dinP
3057
3058
3059
3060
fadE22
ligB
cstA
3064
emrE
3066
3067
alaU
pgmA
3069
3070
3071
3072
3073
3074
3075
3076
atsF
3078 30
79
pknK
3081
virS
3083
631
67
3168
3169
3170
hpx
3172
3173
3174
3175
lipS
3177
3178
3179
3180
3181
3182 3183
IS61
10
3184
3185
IS61
10
3186
3187
3188
3189
3190
3191IS
1603
metU31
92
3193
3194
3195
3196
3197
uvrD
2
3199
3200
3201
3202
lipV
3204 32
05
moeZ
3207
3208
3209 32
10
rhlE
3212
3213
entD
entC
3216 32
17
3218
whiB1 32
2032
21
3222
sigH
3300
phoY
1
glpD2
lpdA
3304
amiA
amiB
deoD
pmmB
upp
3310
3311
3312
add
deoA
cdd
sdhC sdhD
sdhA
sdhB 33
2033
21 moaE3 gphA moaC3
IS61
10
3325
3326
IS15
4733
27
sigJ
3329
3330
sugI
nagA
3333
3334
3335
trpS
3337
3338
icd1
metC
metA
3342
PPE
PE_PGRS
PE_PGRS
3346
PPE
whiB3 gr
oEL1
groE
S
gcp
rimI 3421
3422
alr
3424
PPE
PPE
IS15
32
3427
3428
PPEIS
1540
3430
IS15
52
3431
gadB
3433
3434
3435
glmS
3437
3438
3439
3440
mrsA
rpsI
rplM
3444
3445
3446
3447
3448
3449
3450
3451
3452
3453
3454
truA
rplQ
rpoA
rpsD
rpsK rpsM
rpmJ infA
3463
rmlB
rmlC
3466 REP3467
rmlB3
mhpE
ilvB2
3471
3472 bp
oA
IS61
10
3474
3475
kgtP
PE
PPE
3479
3480
fadA5
3547
3548
3549
echA
2035
5135
52
3553
fdxB
3555
fadA6
3557
PPE
3559
fadE30
fadD3
fadE31
fadE32
fadE33
aspB
nhoA
3567
3568
3569
3570
3571
3572
fadE34
3574
3575
pknM
3577
arsB
235
79
cysS
3581
3582
3583
lpqE
radA
3586
3587
3588
mutYPE_P
GRS
3591
3592
lpqF
3594 PE_P
GRS
clpC
lsr2
lysS
3599
3600
panD
panC
3603
3604
3605
folK
folX
folP
3693
3694
3695
glpK
3697
3698
3699
3700
3701
3702
3703
gshA
3705
3706
3707
asd
ask
leuA
dnaQ
3712
cobQ
2 3714
recR
3716
3717 37
18
3719
3720
dnaZ
X
3722
serV
3723
3724
3725
3726
3727
3728
3729
3730
ligC
3732
3733
3734
3735
3736
3737
PPEPPE
3740
3741
3742
3743
3744
3745PE
3747
3748 37
4937
5037
51
serX 3752 3753
A2
3821
3822
mmpL8
papA
1
pks2
fadD23
IS15
37
3827
3828
3829
3830
3831 38
32
3833
serS
3835
3836
3837
pheA
3839
3840
bfrB glp
Q1
3843
3844
3845
sodA
3847
3848
3849
3850
3851
hns
menG
3854
3855
3856
3857
gltD
gltB
3860
3861
3862
3863
3864
3865
3866
3867
3868
3869
3870
3871
Nature © Macmillan Publishers Ltd 1998
8
family)Rv0823c - transcriptional regulator
(NifR3/Smm1 family)Rv0827c - transcriptional regulator (ArsR
family)Rv0890c - transcriptional regulator
(LuxR/UhpA family)Rv0891c - putative transcriptional regulatorRv0894 - putative transcriptional regulatorRv1019 - transcriptional regulator
(TetR/AcrR family)Rv1049 - transcriptional regulator (MarR
family)Rv1129c - transcriptional regulator
(PbsX/Xre family)Rv1151c - putative transcriptional regulatorRv1152 - transcriptional regulator (GntR
family)Rv1167c - putative transcriptional regulatorRv1219c - putative transcriptional regulatorRv1255c - transcriptional regulator
(TetR/AcrR family)Rv1332 - putative transcriptional regulatorRv1353c - transcriptional regulator
(TetR/AcrR family)Rv1358 - transcriptional regulator
(LuxR/UhpA family)Rv1359 - putative transcriptional regulatorRv1395 - transcriptional regulator
(AraC/XylS family)Rv1404 - transcriptional regulator (MarR
family)Rv1423 - putative transcriptional regulatorRv1460 - putative transcriptional regulatorRv1474c - transcriptional regulator
(TetR/AcrR family)Rv1534 - transcriptional regulator
(TetR/AcrR family)Rv1556 - putative transcriptional regulatorRv1674c - putative transcriptional regulatorRv1675c - putative transcriptional regulatorRv1719 - transcriptional regulator (IclR
family)Rv1773c - transcriptional regulator (IclR
family)Rv1776c - putative transcriptional regulatorRv1816 - putative transcriptional regulatorRv1846c - putative transcriptional regulatorRv1931c - transcriptional regulator
(AraC/XylS family)Rv1956 - putative transcriptional regulatorRv1963c - putative transcriptional regulatorRv1985c - transcriptional regulator (LysR
family)Rv1990c - putative transcriptional regulatorRv1994c - transcriptional regulator (MerR
family)Rv2017 - putative transcriptional regulator
(PbsX/Xre family)Rv2021c - putative transcriptional regulatorRv2034 - transcriptional regulator (ArsR
family)Rv2175c - putative transcriptional regulatorRv2250c - putative transcriptional regulatorRv2258c - putative transcriptional regulatorRv2282c - transcriptional regulator (LysR
family)Rv2308 - putative transcriptional regulatorRv2324 - transcriptional regulator
(Lrp/AsnC family)Rv2358 - transcriptional regulator (ArsR
family)Rv2488c - transcriptional regulator
(LuxR/UhpA family)Rv2506 - transcriptional regulator
(TetR/AcrR family)Rv2621c - putative transcriptional regulatorRv2640c - transcriptional regulator (ArsR
family)Rv2642 - transcriptional regulator (ArsR
family)Rv2669 - putative transcriptional regulatorRv2745c - putative transcriptional regulatorRv2779c - transcriptional regulator
(Lrp/AsnC family)Rv2887 - transcriptional regulator (MarR
family)Rv2912c - transcriptional regulator
(TetR/AcrR family)Rv2989 - transcriptional regulator (IclR
family)Rv3050c - putative transcriptional regulatorRv3055 - putative transcriptional regulatorRv3058c - putative transcriptional regulatorRv3060c - transcriptional regulator (GntR
family)Rv3066 - putative transcriptional regulatorRv3095 - putative transcriptional regulatorRv3124 - transcriptional regulator
(AfsR/DndI/RedD family)
Rv3160c - putative transcriptional regulatorRv3167c - putative transcriptional regulatorRv3173c - transcriptional regulator
(TetR/AcrR family)Rv3183 - putative transcriptional regulatorRv3208 - transcriptional regulator
(TetR/AcrR family)Rv3249c - transcriptional regulator
(TetR/AcrR family)Rv3291c - transcriptional regulator
(Lrp/AsnC family)Rv3295 - transcriptional regulator
(TetR/AcrR family)Rv3334 - transcriptional regulator (MerR
family)Rv3405c - putative transcriptional regulatorRv3522 - putative transcriptional regulatorRv3557c - transcriptional regulator
(TetR/AcrR family)Rv3574 - transcriptional regulator
(TetR/AcrR family)Rv3575c - transcriptional regulator (LacI
family)Rv3583c - putative transcriptional regulatorRv3676 - transcriptional regulator (Crp/Fnr
family)Rv3678c - transcriptional regulator (LysR
family)Rv3736 - transcriptional regulator
(AraC/XylS family)Rv3744 - transcriptional regulator (ArsR
family)Rv3830c - transcriptional regulator
(TetR/AcrR family)Rv3833 - transcriptional regulator
(AraC/XylS family)Rv3840 - putative transcriptional regulatorRv3855 - putative transcriptional regulator
2. Two component systemsRv1028c kdpD sensor histidine kinaseRv1027c kdpE two-component response
regulatorRv3246c mtrA two-component response
regulatorRv3245c mtrB sensor histidine kinaseRv0844c narL two-component response
regulatorRv0757 phoP two-component response
regulatorRv0758 phoR sensor histidine kinaseRv0491 regX3 two-component response
regulatorRv0490 senX3 sensor histidine kinaseRv0602c tcrA two-component response
regulatorRv0260c - two-component response
regulatorRv0600c - sensor histidine kinaseRv0601c - sensor histidine kinaseRv0818 - two-component response
regulatorRv0845 - sensor histidine kinaseRv0902c - sensor histidine kinaseRv0903c - two-component response
regulatorRv0981 - two-component response
regulatorRv0982 - sensor histidine kinaseRv1032c - sensor histidine kinaseRv1033c - two-component response
regulatorRv1626 - two-component response
regulatorRv2027c - sensor histidine kinaseRv2884 - two-component response
regulatorRv3132c - sensor histidine kinaseRv3133c - two-component response
regulatorRv3143 - putative sensory transduction
proteinRv3220c - sensor histidine kinaseRv3764c - sensor histidine kinaseRv3765c - two-component response
regulator
3. Serine-threonine protein kinases and phosphoproteinphosphatasesRv0015c pknA serine-threonine protein kinaseRv0014c pknB serine-threonine protein kinaseRv0931c pknD serine-threonine protein kinaseRv1743 pknE serine-threonine protein kinaseRv1746 pknF serine-threonine protein kinaseRv0410c pknG serine-threonine protein kinaseRv1266c pknH serine-threonine protein kinaseRv2914c pknI serine-threonine protein kinaseRv2088 pknJ serine-threonine protein kinaseRv3080c pknK serine-threonine protein kinaseRv2176 pknL serine-threonine protein kinase,
truncatedRv0018c ppp putative phosphoprotein phos-
phataseRv2234 ptpA low molecular weight protein-tyro-
sine-phosphataseRv0153c - putative protein-tyrosine-phos-
phatase
II. Macromolecule metabolismA. Synthesis and modification of macromolecules1. Ribosomal protein synthesis and modificationRv3420c rimI ribosomal protein S18 acetyl
transferaseRv0995 rimJ acetylation of 30S S5 subunitRv0641 rplA 50S ribosomal protein L1Rv0704 rplB 50S ribosomal protein L2Rv0701 rplC 50S ribosomal protein L3Rv0702 rplD 50S ribosomal protein L4Rv0716 rplE 50S ribosomal protein L5Rv0719 rplF 50S ribosomal protein L6Rv0056 rplI 50S ribosomal protein L9Rv0651 rplJ 50S ribosomal protein L10Rv0640 rplK 50S ribosomal protein L11Rv0652 rplL 50S ribosomal protein L7/L12Rv3443c rplM 50S ribosomal protein L13Rv0714 rplN 50S ribosomal protein L14Rv0723 rplO 50S ribosomal protein L15Rv0708 rplP 50S ribosomal protein L16Rv3456c rplQ 50S ribosomal protein L17Rv0720 rplR 50S ribosomal protein L18Rv2904c rplS 50S ribosomal protein L19Rv1643 rplT 50S ribosomal protein L20Rv2442c rplU 50S ribosomal protein L21Rv0706 rplV 50S ribosomal protein L22Rv0703 rplW 50S ribosomal protein L23Rv0715 rplX 50S ribosomal protein L24Rv1015c rplY 50S ribosomal protein L25Rv2441c rpmA 50S ribosomal protein L27Rv0105c rpmB 50S ribosomal protein L28Rv2058c rpmB2 50S ribosomal protein L28Rv0709 rpmC 50S ribosomal protein L29 Rv0722 rpmD 50S ribosomal protein L30Rv1298 rpmE 50S ribosomal protein L31Rv2057c rpmG 50S ribosomal protein L33Rv3924c rpmH 50S ribosomal protein L34Rv1642 rpmI 50S ribosomal protein L35Rv3461c rpmJ 50S ribosomal protein L36Rv1630 rpsA 30S ribosomal protein S1Rv2890c rpsB 30S ribosomal protein S2Rv0707 rpsC 30S ribosomal protein S3Rv3458c rpsD 30S ribosomal protein S4Rv0721 rpsE 30S ribosomal protein S5Rv0053 rpsF 30S ribosomal protein S6Rv0683 rpsG 30S ribosomal protein S7Rv0718 rpsH 30S ribosomal protein S8Rv3442c rpsI 30S ribosomal protein S9Rv0700 rpsJ 30S ribosomal protein S10Rv3459c rpsK 30S ribosomal protein S11Rv0682 rpsL 30S ribosomal protein S12Rv3460c rpsM 30S ribosomal protein S13Rv0717 rpsN 30S ribosomal protein S14Rv2056c rpsN2 30S ribosomal protein S14Rv2785c rpsO 30S ribosomal protein S15Rv2909c rpsP 30S ribosomal protein S16Rv0710 rpsQ 30S ribosomal protein S17Rv0055 rpsR 30S ribosomal protein S18Rv2055c rpsR2 30S ribosomal protein S18Rv0705 rpsS 30S ribosomal protein S19Rv2412 rpsT 30S ribosomal protein S20Rv3241c - member of S30AE ribosomal
protein family
2. Ribosome modification and maturationRv1010 ksgA 16S rRNA dimethyltransferaseRv2838c rbfA ribosome-binding factor ARv2907c rimM 16S rRNA processing protein
3. Aminoacyl tRNA synthases and their modificationRv2555c alaS alanyl-tRNA synthase Rv1292 argS arginyl-tRNA synthaseRv2572c aspS aspartyl-tRNA synthaseRv3580c cysS cysteinyl-tRNA synthaseRv2130c cysS2 cysteinyl-tRNA synthaseRv1406 fmt methionyl-tRNA formyltransferaseRv3011c gatA glu-tRNA-gln amidotransferase,
subunit BRv3009c gatB glu-tRNA-gln amidotransferase,
subunit ARv3012c gatC glu-tRNA-gln amidotransferase,
subunit C Rv2992c gltS glutamyl-tRNA synthaseRv2357c glyS glycyl-tRNA synthaseRv2580c hisS histidyl-tRNA synthaseRv1536 ileS isoleucyl-tRNA synthaseRv0041 leuS leucyl-tRNA synthaseRv3598c lysS lysyl-tRNA synthaseRv1640c lysX C-term lysyl-tRNA synthaseRv1007c metS methionyl-tRNA synthaseRv1649 pheS phenylalanyl-tRNA synthase α
subunit
Nature © Macmillan Publishers Ltd 1998
8
Rv1650 pheT phenylalanyl-tRNA synthase β subunit
Rv2845c proS prolyl-tRNA synthaseRv3834c serS seryl-tRNA synthaseRv2614c thrS threonyl-tRNA synthaseRv2906c trmD tRNA (guanine-N1)-methyltrans-
feraseRv3336c trpS tryptophanyl tRNA synthaseRv1689 tyrS tyrosyl-tRNA synthaseRv2448c valS valyl-tRNA synthase
4. NucleoproteinsRv1407 fmu similar to Fmu proteinRv3852 hns HU-histone proteinRv2986c hupB DNA-binding protein II Rv1388 mIHF integration host factor
5. DNA replication, repair, recombination and restric-tion/modificationRv1317c alkA DNA-3-methyladenine glycosi-
dase IIRv2836c dinF DNA-damage-inducible protein FRv1329c dinG probable ATP-dependent helicaseRv3056 dinP DNA-damage-inducible proteinRv1537 dinX probable DNA-damage-inducible
proteinRv0001 dnaA chromosomal replication initiator
proteinRv0058 dnaB DNA helicase (contains intein)Rv1547 dnaE1 DNA polymerase III, α subunitRv3370c dnaE2 DNA polymerase III α chainRv2343c dnaG DNA primaseRv0002 dnaN DNA polymerase III, β subunitRv3711c dnaQ DNA polymerase III e chainRv3721c dnaZX DNA polymerase III, γ (dnaZ) and
τ (dnaX)Rv2924c fpg formamidopyrimidine-DNA glyco-
sylaseRv0006 gyrA DNA gyrase subunit ARv0005 gyrB DNA gyrase subunit BRv2092c helY probable helicase, Ski2 subfamilyRv2101 helZ probable helicase, Snf2/Rad54
familyRv2756c hsdM type I restriction/modification sys-
tem DNA methylaseRv2755c hsdS' type I restriction/modification sys-
tem specificity determinantRv3296 lhr ATP-dependent helicaseRv3014c ligA DNA ligaseRv3062 ligB DNA ligaseRv3731 ligC probable DNA ligaseRv1020 mfd transcription-repair coupling factorRv2528c mrr restriction system proteinRv2985 mutT1 MutT homologueRv1160 mutT2 MutT homologueRv0413 mutT3 MutT homologueRv3589 mutY probable DNA glycosylaseRv3297 nei probable endonuclease VIIIRv3674c nth probable endonuclease IIIRv1316c ogt methylated-DNA-protein-cysteine
methyltransferaseRv1629 polA DNA polymerase IRv1402 priA putative primosomal protein n'
(replication factor Y)Rv3585 radA probable DNA repair RadA homo-
logueRv2737c recA recombinase (contains intein)Rv0630c recB exodeoxyribonuclease VRv0631c recC exodeoxyribonuclease VRv0629c recD exodeoxyribonuclease VRv0003 recF DNA replication and SOS induc-
tionRv2973c recG ATP-dependent DNA helicaseRv1696 recN recombination and DNA repairRv3715c recR RecBC-Independent process of
DNA repairRv2736c recX regulatory protein for RecARv2593c ruvA Holliday junction binding protein,
DNA helicaseRv2592c ruvB Holliday junction binding proteinRv2594c ruvC Holliday junction resolvase, endo-
deoxyribonucleaseRv0054 ssb single strand binding proteinRv1210 tagA DNA-3-methyladenine glycosi-
dase IRv3646c topA DNA topoisomeraseRv2976c ung uracil-DNA glycosylaseRv1638 uvrA excinuclease ABC subunit ARv1633 uvrB excinuclease ABC subunit BRv1420 uvrC excinuclease ABC subunit CRv0949 uvrD DNA-dependent ATPase I and
helicase II Rv3198c uvrD2 putative UvrDRv0427c xthA exodeoxyribonuclease IIIRv0071 - group II intron maturaseRv0861c - probable DNA helicaseRv0944 - possible formamidopyrimidine-
DNA glycosylaseRv1688 - probable 3-methylpurine DNA
glycosylase
Rv2090 - partially similar to DNA poly-merase I
Rv2191 - similar to both PolC and UvrC proteins
Rv2464c - probable DNA glycosylase, endonuclease VIII
Rv3201c - probable ATP-dependent DNA helicase
Rv3202c - similar to UvrD proteinsRv3263 - probable DNA methylaseRv3644c - similar in N-term to DNA poly-
merase III
6. Protein translation and modificationRv0429c def polypeptide deformylaseRv2534c efp elongation factor PRv2882c frr ribosome recycling factorRv0684 fusA elongation factor GRv0120c fusA2 elongation factor G Rv1080c greA transcription elongation factor GRv3462c infA initiation factor IF-1Rv2839c infB initiation factor IF-2 Rv1641 infC initiation factor IF-3Rv0009 ppiA peptidyl-prolyl cis-trans isomeraseRv2582 ppiB peptidyl-prolyl cis-trans isomeraseRv1299 prfA peptide chain release factor 1Rv3105c prfB peptide chain release factor 2Rv2889c tsf elongation factor EF-TsRv0685 tuf elongation factor EF-Tu
7. RNA synthesis, RNA modification and DNA transcriptionRv1253 deaD ATP-dependent DNA/RNA
helicaseRv2783c gpsI pppGpp synthase and polyribo-
nucleotide phosphorylaseRv2841c nusA transcription termination factorRv2533c nusB N-utilization substance protein BRv0639 nusG transcription antitermination
proteinRv3907c pcnA polynucleotide polymeraseRv3232c pvdS alternative sigma factor for
siderophore productionRv3211 rhlE probable ATP-dependent
RNA helicaseRv1297 rho transcription termination
factor rhoRv3457c rpoA α subunit of RNA polymeraseRv0667 rpoB β subunit of RNA polymeraseRv0668 rpoC β' subunit of RNA polymeraseRv1364c rsbU SigB regulation protein Rv3287c rsbW anti-sigma B factorRv2703 sigA RNA polymerase sigma factor
(aka MysA, RpoV)Rv2710 sigB RNA polymerase sigma factor
(aka MysB)Rv2069 sigC ECF subfamily sigma subunitRv3414c sigD ECF subfamily sigma subunitRv1221 sigE ECF subfamily sigma subunitRv3286c sigF ECF subfamily sigma subunitRv0182c sigG sigma-70 factors ECF subfamily Rv3223c sigH ECF subfamily sigma subunitRv1189 sigI ECF family sigma factorRv3328c sigJ similar to SigI, ECF familyRv0445c sigK ECF-type sigma factorRv0735 sigL sigma-70 factors ECF subfamilyRv3911 sigM probable sigma factor, similar to
SigERv3366 spoU probable rRNA methylaseRv3455c truA probable pseudouridylate syn-
thaseRv2793c truB tRNA pseudouridine 55 synthaseRv1644 tsnR putative 23S rRNA methyltrans-
feraseRv3649 - ATP-dependent DNA/RNA heli-
case
8. Polysaccharides (cytoplasmic)Rv1326c glgB 1,4-α-glucan branching enzymeRv1328 glgP probable glycogen phosphory-
laseRv1564c glgX probable glycogen debranching
enzymeRv1563c glgY putative α-amylaseRv1562c glgZ maltooligosyltrehalose trehalohy-
drolaseRv0126 - probable glycosyl hydrolaseRv1781c - probable 4-α-glucanotransferaseRv2471 - probable maltase α-glucosidase
B. Degradation of macromolecules1. RNARv1014c pth peptidyl-tRNA hydrolaseRv2925c rnc RNAse IIIRv2444c rne similar at C-term to ribo-
nuclease ERv2902c rnhB ribonuclease HIIRv3923c rnpA ribonuclease P protein compo-
nentRv1340 rphA ribonuclease PH
2. DNARv0670 end endonuclease IV (apurinase)Rv1108c xseA exonuclease VII large subunitRv1107c xseB exonuclease VII small subunit
3. Proteins, peptides and glycopeptidesRv3305c amiA probable aminohydrolaseRv3306c amiB probable aminohydrolaseRv3596c clpC ATP-dependent Clp proteaseRv2461c clpP ATP-dependent Clp protease pro-
teolytic subunitRv2460c clpP2 ATP-dependent Clp protease pro-
teolytic subunitRv2457c clpX ATP-dependent Clp protease
ATP-binding subunit ClpXRv2667 clpX' similar to ClpC from M. leprae but
shorterRv3419c gcp glycoproteaseRv2725c hflX GTP-binding proteinRv1223 htrA serine proteaseRv2861c map methionine aminopeptidaseRv0734 map' probable methionine aminopepti-
daseRv0319 pcp pyrrolidone-carboxylate peptidaseRv0125 pepA probable serine proteaseRv2213 pepB aminopeptidase A/IRv0800 pepC aminopeptidase IRv2467 pepD probable aminopeptidaseRv2089c pepE cytoplasmic peptidaseRv2535c pepQ cytoplasmic peptidaseRv2782c pepR protease/peptidase, M16 family
(insulinase)Rv2109c prcA proteasome α-type subunit 1Rv2110c prcB proteasome β-type subunit 2Rv0782 ptrBa protease II, α subunitRv0781 ptrBb protease II, β subunitRv0724 sppA protease IV, signal peptide pepti-
daseRv0198c - probable zinc metalloproteaseRv0457c - probable peptidaseRv0840c - probable proline iminopeptidaseRv0983 - probable serine proteaseRv1977 - probable zinc metallopeptidaseRv3668c - probable alkaline serine proteaseRv3671c - probable serine proteaseRv3883c - probable secreted proteaseRv3886c - protease
4. Polysaccharides, lipopolysaccharides and phospho-lipidsRv0062 celA cellulase/endoglucanaseRv3915 cwlM hydrolaseRv0315 - probable β-1,3-glucanaseRv1090 - probable inactivated
cellulase/endoglucanaseRv1327c - probable glycosyl hydrolase, α-
amylase familyRv1333 - probable hydrolaseRv3463 - probable neuraminidaseRv3717 - possible N-acetylmuramoyl-L-ala-
nine amidase
5. Esterases and lipasesRv0220 lipC probable esteraseRv1923 lipD probable esteraseRv3775 lipE probable hydrolaseRv3487c lipF probable esteraseRv0646c lipG probable hydrolaseRv1399c lipH probable lipaseRv1400c lipI probable lipaseRv1900c lipJ probable esteraseRv2385 lipK probable acetyl-hydrolaseRv1497 lipL esteraseRv2284 lipM probable esteraseRv2970c lipN probable lipase/esteraseRv1426c lipO probable esteraseRv2463 lipP probable esteraseRv2485c lipQ probable carboxlyesteraseRv3084 lipR probable acetyl-hydrolaseRv3176c lipS probable esterase/lipaseRv2045c lipT probable carboxylesteraseRv1076 lipU probable esteraseRv3203 lipV probable lipaseRv0217c lipW probable esteraseRv2351c plcA phospholipase C precursorRv2350c plcB phospholipase C precursorRv2349c plcC phospholipase C precursorRv1755c plcD partial CDS for phospholipase CRv1104 - probable esterase pseudogeneRv1105 - probable esterase pseudogene
6. Aromatic hydrocarbonsRv3469c mhpE probable 4-hydroxy-2-oxovalerate
aldolaseRv0316 - probable muconolactone iso-
meraseRv0771 - probable 4-carboxymuconolac-
tone decarboxylaseRv0939 - probable dehydraseRv1723 - 6-aminohexanoate-dimer hydro-
Nature © Macmillan Publishers Ltd 1998
8
laseRv2715 - 2-hydroxymuconic semialdehyde
hydrolaseRv3530c - probable cis-diol dehydrogenaseRv3534c - 4-hydroxy-2-oxovalerate aldolaseRv3536c - aromatic hydrocarbon degrada-
tion
C. Cell envelope1. Lipoproteins (lppA-lpr0) 65
2. Surface polysaccharides, lipopolysaccharides, pro-teins and antigensRv0806c cpsY probable UDP-glucose-4-
epimeraseRv3811 csp secreted proteinRv1677 dsbF highly similar to C-term Mpt53Rv3794 embA involved in arabinogalactan syn-
thesisRv3795 embB involved in arabinogalactan syn-
thesisRv3793 embC involved in arabinogalactan syn-
thesisRv3875 esat6 early secretory antigen targetRv0112 gca probable GDP-mannose dehy-
drataseRv0113 gmhA phosphoheptose isomeraseRv2965c kdtB lipopolysaccharide core biosyn-
thesis proteinRv2878c mpt53 secreted protein Mpt53Rv1980c mpt64 secreted immunogenic protein
Mpb64/Mpt64Rv2875 mpt70 major secreted immunogenic pro-
tein Mpt70 precursorRv2873 mpt83 surface lipoprotein Mpt83Rv0899 ompA member of OmpA familyRv3810 pirG cell surface protein precursor (Erp
protein)Rv3782 rfbE similar to rhamnosyl transferaseRv1302 rfe undecaprenyl-phosphate α-N-
acetylglucosaminyltransferaseRv2145c wag31 antigen 84 (aka wag31)Rv0431 - tuberculin related peptide (AT103)Rv0954 - cell envelope antigenRv1514c - involved in polysaccharide syn-
thesisRv1518 - involved in exopolysaccharide
synthesisRv1758 - partial cutinaseRv1910c - probable secreted proteinRv1919c - weak similarity to pollen antigensRv1984c - probable secreted proteinRv1987 - probable secreted proteinRv2223c - probable exported proteaseRv2224c - probable exported proteaseRv2301 - probable cutinaseRv2345 - precursor of probable membrane
proteinRv2672 - putative exported proteaseRv3019c - similar to Esat6Rv3036c - probable secreted proteinRv3449 - probable precursor of serine pro-
teaseRv3451 - probable cutinaseRv3452 - probable cutinase precursorRv3724 - probable cutinase precursor
3. Murein sacculus and peptidoglycanRv2911 dacB penicillin binding proteinRv2981c ddlA D-alanine-D-alanine ligase ARv3809c glf UDP-galactopyranose mutaseRv1018c glmU UDP-N-acetylglucosamine
pyrophosphorylaseRv3382c lytB LytB protein homologueRv1110 lytB' very similar to LytBRv1315 murA UDP-N-acetylglucosamine-1-car-
boxyvinyltransferaseRv0482 murB UDP-N-acetylenolpyruvoylglu-
cosamine reductaseRv2152c murC UDP-N-acetyl-muramate-alanine
ligaseRv2155c murD UDP-N-acetylmuramoylalanine-D-
glutamate ligaseRv2158c murE meso-diaminopimelate-adding
enzymeRv2157c murF D-alanine:D-alanine-adding
enzymeRv2153c murG transferase in peptidoglycan syn-
thesisRv1338 murI glutamate racemaseRv2156c murX phospho-N-acetylmuramoyl-
petapeptide transferaseRv3332 nagA N-acetylglucosamine-6-P-
deacetylaseRv0016c pbpA penicillin-binding proteinRv2163c pbpB penicillin-binding protein 2Rv0050 ponA penicillin-bonding proteinRv3682 ponA' class A penicillin binding proteinRv0017c rodA FtsW/RodA/SpovE familyRv0907 - probable penicillin binding protein
Rv1367c - probable penicillin binding proteinRv1730c - probable penicillin binding proteinRv1922 - probable penicillin binding proteinRv2864c - probable penicillin binding proteinRv3330 - probable penicillin binding proteinRv3627c - probable penicillin binding protein
4. Conserved membrane proteinsRv0402c mmpL1 conserved large membrane
proteinRv0507 mmpL2 conserved large membrane
proteinRv0206c mmpL3 conserved large membrane
proteinRv0450c mmpL4 conserved large membrane
proteinRv0676c mmpL5 conserved large membrane
proteinRv1557 mmpL6 conserved large membrane
proteinRv2942 mmpL7 conserved large membrane
proteinRv3823c mmpL8 conserved large membrane
proteinRv2339 mmpL9 conserved large membrane
proteinRv1183 mmpL10 conserved large membrane
proteinRv0202c mmpL11 conserved large membrane
proteinRv1522c mmpL12 conserved large membrane
proteinRv0403c mmpS1 conserved small membrane
proteinRv0506 mmpS2 conserved small membrane
proteinRv2198c mmpS3 conserved small membrane
proteinRv0451c mmpS4 conserved small membrane
proteinRv0677c mmpS5 conserved small membrane
protein
5. Other membrane proteins 211
III. Cell processesA. Transport/binding proteins1. Amino acidsRv2127 ansP L-asparagine permeaseRv0346c aroP2 probable aromatic amino acid
permeaseRv0917 betP glycine betaine transportRv1704c cycA transport of D-alanine, D-serine
and glycineRv3666c dppA probable peptide transport system
permeaseRv3665c dppB probable peptide transport system
permeaseRv3664c dppC probable peptide transport system
permeaseRv3663c dppD probable ABC-transporterRv0522 gabP probable 4-amino butyrate trans-
porterRv0411c glnH putative glutamine binding proteinRv2564 glnQ probable ATP-binding transport
proteinRv1280c oppA probable oligopeptide transport
proteinRv1283c oppB oligopeptide transport proteinRv1282c oppC oligopeptide transport system per-
measeRv1281c oppD probable peptide transport proteinRv2320c rocE arginine/ornithine transporterRv3253c - probable cationic amino acid
transportRv3454 - possible proline permease
2. CationsRv2920c amt putative ammonium transporterRv1607 chaA putative calcium/proton antiporterRv1239c corA probable magnesium and cobalt
transport proteinRv0092 ctpA cation-transporting ATPaseRv0103c ctpB cation transport ATPaseRv3270 ctpC cation transport ATPaseRv1469 ctpD probable cadmium-transporting
ATPaseRv0908 ctpE probable cation transport ATPaseRv1997 ctpF probable cation transport ATPase Rv1992c ctpG probable cation transport ATPaseRv0425c ctpH C-terminal region putative cation-
transporting ATPaseRv0107c ctpI probable magnesium transport
ATPaseRv0969 ctpV cation transport ATPaseRv3044 fecB putative FeIII-dicitrate transporterRv0265c fecB2 iron transport protein FeIII dici-trate transporterRv1029 kdpA potassium-transporting ATPase A
chain
Rv1030 kdpB potassium-transporting ATPase B chain
Rv1031 kdpC potassium-transporting ATPase C chain
Rv3236c kefB probable glutathione-regulated potassium-efflux protein
Rv2877c merT possible mercury resistance transport system
Rv1811 mgtC probable magnesium transport ATPase protein C
Rv0362 mgtE putative magnesium ion transporter
Rv2856 nicT probable nickel transport proteinRv0924c nramp transmembrane protein belonging
to Nramp familyRv2691 trkA probable potassium uptake pro-
tein Rv2692 trkB probable potassium uptake pro-
tein Rv2287 yjcE probable Na+/H+ exchangerRv2723 - probable membrane protein,
tellurium resistanceRv3162c - probable membrane proteinRv3237c - possible potassium channel
proteinRv3743c - probable cation-transporting
ATPase
3. Carbohydrates, organic acids and alcoholsRv2443 dctA C4-dicarboxylate transport proteinRv3476c kgtP sugar transport proteinRv1902c nanT probable sialic acid transporter Rv1236 sugA membrane protein probably
involved in sugar transportRv1237 sugB sugar transport proteinRv1238 sugC ABC transporter component of
sugar uptake systemRv3331 sugI probable sugar transport proteinRv2835c ugpA sn-glycerol-3-phosphate
permeaseRv2833c ugpB sn-glycerol-3-phosphate-binding
periplasmic lipoproteinRv2832c ugpC sn-glycerol-3-phosphate transport
ATP-binding proteinRv2834c ugpE sn-glycerol-3-phosphate transport
system proteinRv2316 uspA sugar transport proteinRv2318 uspC sugar transport proteinRv2317 uspE sugar transport proteinRv1200 - probable sugar transporterRv2038c - probable ABC sugar transporterRv2039c - probable sugar transporterRv2040c - probable sugar transporterRv2041c - probable sugar transporter
4. AnionsRv2684 arsA probable arsenical pumpRv2685 arsB probable arsenical pumpRv3578 arsB2 probable arsenical pumpRv2643 arsC probable arsenical pumpRv2397c cysA sulphate transport ATP-binding
protein Rv2399c cysT sulphate transport system perme-
ase proteinRv2398c cysW sulphate transport system perme-
ase proteinRv1857 modA molybdate binding proteinRv1858 modB transport system permease,
molybdate uptakeRv1859 modC molybdate uptake ABC-
transporterRv1860 modD precursor of Apa (45/47
kD secreted protein)Rv2329c narK1 probable nitrite extrusion proteinRv1737c narK2 nitrite extrusion protein Rv0261c narK3 nitrite extrusion protein1 Rv0267 narU similar to nitrite extrusion
protein 2Rv0934 phoS1 PstS component of phosphate
uptakeRv0928 phoS2 PstS component of phosphate
uptakeRv0820 phoT phosphate transport system ABC
transporterRv3301c phoY1 phosphate transport system
regulatorRv0821c phoY2 phosphate transport system
regulatorRv0545c pitA low-affinity inorganic phosphate
transporterRv2281 pitB phosphate permeaseRv0930 pstA1 PstA component of phosphate
uptakeRv0936 pstA2 PstA component of phosphate
uptakeRv0933 pstB ABC transport component of
phosphate uptakeRv0935 pstC PstC component of phosphate
uptakeRv0929 pstC2 membrane-bound component of
Nature © Macmillan Publishers Ltd 1998
8
phosphate transport systemRv0932c pstS PstS component of phosphate
uptakeRv2400c subI sulphate binding precursorRv0143c - probable chloride channelRv1707 - probable sulphate permeaseRv1739c - possible sulphate transporterRv3679 - possible anion transporterRv3680 - probable anion transporter
5. Fatty acid transportRv2790c ltp1 non-specific lipid transport proteinRv3540c ltp2 non-specific lipid transport protein
6. Efflux proteinsRv2936 drrA similar daunorubicin resistance
ABC-transporter Rv2937 drrB similar daunorubicin resistance
transmembrane proteinRv2938 drrC similar daunorubicin resistance
transmembrane proteinRv2846c efpA putative efflux proteinRv3065 emrE resistance to ethidium bromideRv0783c - multidrug resistance proteinRv0849 - possible quinolone efflux pumpRv1145 - probable drug transporterRv1146 - probable drug transporterRv1250 - probable drug efflux proteinRv1258c - probable multidrug resistance
pumpRv1410c - probable drug efflux proteinRv1634 - probable drug efflux proteinRv1819c - probable multidrug resistance
pumpRv2136c - putative bacitracin resistance pro-
teinRv2209 - probable drug efflux proteinRv2333c - probable tetracenomycin C resis-
tance proteinRv2994 - probable fluoroquinolone efflux
proteinRv1877 - probable drug efflux proteinRv2459 - probable drug efflux protein
B. Chaperones/Heat shockRv0384c clpB heat shock proteinRv0352 dnaJ acts with GrpE to stimulate DnaK
ATPaseRv2373c dnaJ2 DnaJ homologueRv0350 dnaK 70 kD heat shock protein, chromo-
some replicationRv3417c groEL1 60 kD chaperonin 1Rv0440 groEL2 60 kD chaperonin 2Rv3418c groES 10 kD chaperoneRv0351 grpE stimulates DnaK ATPase activityRv2374c hrcA heat-inducible transcription
repressorRv0251c hsp possible heat shock proteinRv0353 hspR heat shock regulatorRv2031c hspX 14kD antigen, heat shock protein
Hsp20 familyRv2299c htpG heat shock protein Hsp90 familyRv0563 htpX probable (transmembrane) heat
shock proteinRv2701c suhB putative extragenic suppressor
proteinRv3269 - probable heat shock protein
C. Cell divisionRv3641c fic possible cell division proteinRv3102c ftsE membrane proteinRv3610c ftsH inner membrane protein,
chaperoneRv2748c ftsK chromosome partitioningRv2151c ftsQ ingrowth of wall at septumRv2154c ftsW membrane protein (shape determi-
nation)Rv3101c ftsX membrane proteinRv2921c ftsY cell division protein FtsYRv2150c ftsZ circumferential ring, GTPaseRv3919c gid glucose inhibited division protein BRv3625c mesJ probable cell cycle proteinRv3917c parA chromosome partitioning; DNA -
bindingRv3918c parB possibly involved in chromosome
partitioningRv2922c smc member of Smc1/Cut3/Cut14
familyRv0012 - possible cell division proteinRv0435c - ATPase of AAA-familyRv2115c - ATPase of AAA-familyRv3213c - possible role in chromosome seg-
regationRv1708 - possible role in chromosome parti-
tioning
D. Protein and peptide secretionRv2916c ffh signal recognition particle proteinRv2903c lepB signal peptidase IRv1614 lgt prolipoprotein diacylglyceryl trans-
feraseRv1539 lspA lipoprotein signal peptidaseRv0379 sec probable transport protein
SecE/Sec61- γ familyRv3240c secA SecA, preprotein translocase sub-
unitRv1821 secA2 SecA, preprotein translocase sub-
unitRv2587c secD protein-export membrane proteinRv0638 secE SecE preprotein translocaseRv2586c secF protein-export membrane proteinRv1440 secG protein-export membrane protein
SecGRv0732 secY SecY subunit of preprotein translo-
caseRv2462c tig chaperone protein, similar to
trigger factorRv2813 - probable general secretion path-
way protein
E. Adaptations and atypical conditionsRv1901 cinA competence damage protein Rv3648c cspA cold shock protein, transcriptional
regulatorRv0871 cspB probable cold shock proteinRv3063 cstA starvation-induced stress
response proteinRv3490 otsA probable α,α-trehalose-phosphate
synthaseRv2006 otsB trehalose-6-phosphate phos-
phataseRv3372 otsB2 trehalose-6-phosphate phos-
phataseRv3758c proV osmoprotection ABC transporterRv3757c proW transport system permeaseRv3759c proX similar to osmoprotection proteinsRv3756c proZ transport system permeaseRv1026 - probable pppGpp-5'phosphohydro-
lase
F. DetoxificationRv2428 ahpC alkyl hydroperoxide reductaseRv2429 ahpD member of AhpC/TSA familyRv2238c ahpE member of AhpC/TSA familyRv2521 bcp bacterioferritin comigratory proteinRv1608c bcpB probable bacterioferritin comigra-
tory proteinRv3473c bpoA probable non-heme bromoperoxi-
daseRv1123c bpoB probable non-heme bromoperoxi-
daseRv0554 bpoC probable non-heme bromoperoxi-
daseRv3617 ephA probable epoxide hydrolaseRv1938 ephB probable epoxide hydrolaseRv1124 ephC probable epoxide hydrolase Rv2214c ephD probable epoxide hydrolase Rv3670 ephE probable epoxide hydrolaseRv0134 ephF probable epoxide hydrolaseRv3171c hpx probable non-heme haloperoxi-
daseRv1908c katG catalase-peroxidase Rv3846 sodA superoxide dismutaseRv0432 sodC superoxide dismutase precursor -
(Cu-Zn)Rv1932 tpx thiol peroxidaseRv0634c - putative glyoxylase IIRv2581c - putative glyoxylase IIRv3177 - probable non-heme haloperoxi-
dase
IV. OtherA. VirulenceRv0169 mce1 cell invasion proteinRv0589 mce2 cell invasion proteinRv1966 mce3 cell invasion proteinRv3499c mce4 cell invasion proteinRv3100c smpB probable small protein bRv1694 tlyA cytotoxin/hemolysin homologueRv0024 - putative p60 homologueRv0167 - part of mce1 operonRv0168 - part of mce1 operonRv0170 - part of mce1 operonRv0171 - part of mce1 operonRv0172 - part of mce1 operonRv0174 - part of mce1 operonRv0587 - part of mce2 operonRv0588 - part of mce2 operonRv0590 - part of mce2 operonRv0591 - part of mce2 operonRv0592 - part of mce2 operonRv0594 - part of mce2 operonRv1085c - possible hemolysinRv1477 - putative exported p60 protein
homologueRv1478 - putative exported p60 protein
homologueRv1566c - putative exported p60 protein
homologueRv1964 - part of mce3 operonRv1965 - part of mce3 operonRv1967 - part of mce3 operonRv1968 - part of mce3 operonRv1969 - part of mce3 operonRv1971 - part of mce3 operonRv2190c - putative p60 homologueRv3494c - part of mce4 operonRv3496c - part of mce4 operonRv3497c - part of mce4 operonRv3498c - part of mce4 operon
Rv3500c - part of mce4 operonRv3501c - part of mce4 operonRv3896c - putative p60 homologueRv3922c - possible hemolysin
B. IS elements, Repeated sequences, and Phage1. IS elementsIS6110 16 copiesIS1081 6 copiesOthers 37 copies
2. REP13E12 family 7 copies
3. Phage-related functionsRv2894c xerC integrase/recombinaseRv1701 xerD integrase/recombinaseRv1054 - integrase-aRv1055 - integrase-bRv1573 - phiRV1 phage related proteinRv1574 - phiRV1 phage related proteinRv1575 - phiRV1 phage related proteinRv1576c - phiRV1 phage related proteinRv1577c - phiRV1 possible prohead proteaseRv1578c - phiRV1 phage related proteinRv1579c - phiRV1 phage related proteinRv1580c - phiRV1 phage related proteinRv1581c - phiRV1 phage related proteinRv1582c - phiRV1 phage related proteinRv1583c - phiRV1 phage related proteinRv1584c - phiRV1 phage related proteinRv1585c - phiRV1 phage related proteinRv1586c - phiRV1 integraseRv2309c - integraseRv2310 - excisionaseRv2646 - phiRV2 integraseRv2647 - phiRV2 phage related proteinRv2650c - phiRV2 phage related proteinRv2651c - phiRV2 prohead proteaseRv2652c - phiRV2 phage related proteinRv2653c - phiRV2 phage related proteinRv2654c - phiRV2 phage related proteinRv2655c - phiRV2 phage related proteinRv2656c - phiRV2 phage related proteinRv2657c - similar to gp36 of mycobacterio-
phage L5 Rv2658c - phiRV2 phage related proteinRv2659c - phiRV2 integraseRv2830c - similar to phage P1 phd geneRv3750c - excisionaseRv3751 - putative integrase
C. PE and PPE families1. PE familyPE subfamily 38 membersPE_PGRS subfamily 61 members
2. PPE family 68 members
D. Antibiotic production and resistanceRv2068c blaC class A β-lactamaseRv3290c lat lysine-e aminotransferaseRv2043c pncA pyrazinamide resistance/sensitivityRv0133 - possible puromycin N-acetyltrans-
feraseRv0262c - aminoglycoside 2'-N-acetyltrans-
feraseRv0802c - acetyltransferaseRv1082 - similar to S. lincolnensis lmbERv1170 - similar to S. lincolnensis lmbERv1347c - possible aminoglycoside 6'-N-
acetyltransferaseRv2036 - similar to lincomycin production
genesRv2303c - similar to S. griseus macrotetrolide
resistance proteinRv3225c - probable aminoglycoside 3'-phos-
photransferasesRv3700c - probable acetyltransferaseRv3817 - probable aminoglycoside 3'-phos-
photransferase
E. Bacteriocin-like proteins 3
F. Cytochrome P450 enzymes 22
G. Coenzyme F420-dependent enzymes 3
H. Miscellaneous transferases 61
I. Miscellaneous phosphatases, lyases, and hydrolases 18
J. Cyclases 6
K. Chelatases 2
V. Conserved hypotheticals 912
VI. Unknowns 606
TOTAL 3924