Upload
csaba-pal
View
216
Download
1
Embed Size (px)
Citation preview
Evidence against the selfish operon theory
Csaba Pal1,2 and Laurence D. Hurst2
1MTA, Theoretical Biology Research Group, Eotvos Lorand University, Pazmany Peter Setany 1/C, Budapest, H-1117, Hungary2Department of Biology and Biochemistry, University of Bath, Bath, Somerset BA2 7AY, UK
According to the selfish operon hypothesis, the cluster-
ing of genes and their subsequent organization into
operons is beneficial for the constituent genes because
it enables the horizontal gene transfer of weakly
selected, functionally coupled genes. The majority of
these are expected to be non-essential genes. From our
analysis of the Escherichia coli genome, we conclude
that the selfish operon hypothesis is unlikely to provide
a general explanation for clustering nor can it account
for the gene composition of operons. Contrary to expec-
tations, essential genes with related functions have an
especially strong tendency to cluster, even if they are
not in operons. Moreover, essential genes are particu-
larly abundant in operons.
There is an increasing realization that gene order is notrandom and that linked genes tend to share expressioncharacteristics. Nowhere is this more striking than inbacterial operons, in which functionally related genesoften cluster. Although it is tempting to suppose thatoperon evolution came about by the selection for co-expression, this logic has been challenged [1,2]. Co-regulation of genes in operons can provide selection forthe maintenance of operon structure; however, it is notclear how it can explain the assembly of gene clusters bygradual steps because no benefit is expected to be derivedfrom proximity until co-transcription is possible [1]. Theselfish operon hypothesis posits an alternative set ofselective conditions that can potentially address thisconcern. The hypothesis postulates that the linkage oftwo or more functionally related genes is favored because itincreases the probability that genes will be co-transferredduring horizontal gene transfer.
The model posits [2] that operons generally consist ofgenes that can only together fulfill a given function. If thefunction is under weak selection or required for specificconditions, the genes in the operon can be lost easily by acombination of mutation pressure and genetic drift but canbe regained by horizontal gene transfer at a later stage.Given that an upper limit on the length of donor DNAsegment exists [3], the closer in proximity the genes are,the higher the possibility that they can be regained in onestep. The theory therefore predicts that ‘genes for essentialprocesses should not cluster’ [2]. At first sight the theory isnot about operons but is about the clustering of genes.However, the majority of the data applied in support of thetheory relates to operons [2] and, as is clear from the title ofthe hypothesis, the developers posit that the model is
relevant to the evolution of operons. It predicts, theyargue, that bacterial genomes should be ‘interspersed withnovel, horizontally transferred operons providing periph-eral metabolic functions’ [2]. (The authors define periph-eral as meaning non-essential [2].)
Here we report the results of a whole-genome analysisto find out if this hypothesis has the power to explain boththe patterns of clustering and the patterns of gene contentof operons in Escherichia coli. First, we ask whether non-essential genes are relatively enriched within operonscompared with essential genes. This is an important issuebecause the authors of this hypothesis have stated that ‘bycontrast [to the selfish operon model], the co-regulationmodel predicts that essential genes whose co-regulation ismost critical are those most likely to be found in operons’[2]. Given that it has already been established that essen-tial genes cluster in the E. coli genome [4], we next askwhether the extent of gene clustering, in a given functionalcategory, is more pronounced for non-essential genes thanfor essential genes.
Testing the hypothesis
The classification of E. coli K12 genes as either essential ornonessential was taken from a recent systematic study [4].The list of essential genes was augmented with a collectionof gene deletion studies that was retrieved from theprofiling of E. coli chromosome (PEC) database (http://www.shigen.nig.ac.jp/ecoli/pec/index.jsp). Information onthe operons was retrieved from RegulonDB [5] (http://www.cifn.unam.mx/Computational_Genomics/regulondb/),a database that was compiled from the literature on theregulation of transcription in E. coli K12. Consistent withthe co-regulation model, essential genes tend to occur inoperons compared with non-essential genes. From the3445 genes with appropriate data on gene dispensability,602 genes were designated as essential by at least onestudy. Approximately 28% of these are known to be inoperons, whereas this figure is reduced to 23% among thenon-essential genes (x2 ¼ 6.73, df ¼ 1, P ¼ 0.009). To allowfor possible bias in the operon dataset, we repeated theanalysis to include the operons that were predictedcomputationally. The trend remains the same (x2 ¼ 6.57,df ¼ 1, P ¼ 0.01), suggesting that essential genes have aslightly higher tendency to reside in operons.
Although the results described here provide no obvioussupport for the selfish operon theory, one could argue thatoperon formation is only one possible but unnecessaryoutcomeoftheclusteringoffunctionallyrelatedgenes.Thereis strong evidence for the physical clustering of function-ally related genes that are unrelated to operons [6,7].Corresponding author: Laurence D. Hurst ([email protected]).
Update TRENDS in Genetics Vol.20 No.6 June 2004232
www.sciencedirect.com
However, we found that the clustering of functionallyrelated genes is particularly pronounced for essentialgenes. Using the functional classification of E. coli genesthat was derived from the clusters of orthologous groups ofproteins (COG) database [8] (http://www.ncbi.nlm.nih.gov/COG/), we calculated the number of essential gene pairslocated ‘y’ genes away from each other, and calculated thefrequency of pairs that are in the same broad functionalcategory. We repeated the same procedure for all non-essential gene pairs.
We found that the relative increase in the number ofessential gene pairs with related functions in a givenphysical distance is always higher than the relativeincrease in number of non-essential, functionally relatedpairs (Figure 1). Using different, more detailed functionalclassification [9] does not alter this finding (Mantel–Haenszel test: x2 ¼ 41.2, df ¼ 1, P ! 1027). We have goodreason to believe that this result is independent of operonstructure. First, the size of the clusters appears to belarger than the usual size of operons. More importantly,the trend remains even when gene pairs of the sameoperon are excluded from the analysis (Mantel–Haenszeltest: x2 ¼ 81.03, df ¼ 1, P ! 1027). It has been noted pre-viously that ribosomal proteins tend to be essential andcluster in bacterial genomes [2]. Importantly, the differ-ence in the tendency for functional clustering is not a
peculiarity of ribosomal genes. After excluding all of thegene pairs that were involved in translation, the differencein the clustering tendency between essential and non-essential genes remains (Mantel–Haenszel test:x2 ¼ 12.3, df ¼ 1, P , 0.0005). Unfortunately, it is difficultto investigate the relative contribution of the differentfunctional categories to the patterns observed because theobserved number of essential gene pairs will be too low tobe statistically meaningful (data not shown).
The analyses described here can not be considereddefinitive. One might question, for example, whether wecan really define operons. Similarly, one might conjecturethat some essential genes were not always essential butmight have been transferred horizontally and only subse-quently became essential. However, unless these problemsare substantial, the analyses presented here stronglysuggest that the selfish operon theory does not, for themost part, explain the evolution of operon structure andgene clustering on a larger scale in the E. coli genome. Ourresults are further supported by the finding that, whenmore than one horizontally transferred gene is found in agiven operon, they are often the result of independenttransfer events [10].
Alternative hypotheses
Nonetheless, there remains the issue of how the clusteringof genes originates, how the clusters evolve gradually andwhy operons are most prevalent in bacteria. One estab-lished idea is that chromosome organization (and possiblychromatin formation) has an important role in the timingof gene expression and gene dosage [11].
Another possibility relates to the peculiarity of tran-scription and translation in prokaryotes. In this group, thetranslational process often occurs while the 30 end ofthe mRNA is still being synthesized. This means that theprotein product is manufactured in the vicinity of the gene.We have shown previously that an imbalance in theconcentrations of proteins involved in complexes has amajor effect on fitness in yeast [12]. One might thenimagine that the genes for such proteins might be underselection to be linked, so as to ensure the minimal time inwhich the proteins are not bound together in the complex[13]. Such selection would provide a gradual advantage tolinkage and explain why operons are much more prevalentin prokaryotes than in eukaryotes. This model makes twopredictions: (i) genes in which protein products formcomplexes should be more tightly linked than expected bythe null hypothesis –there is some evidence that this is thecase [14]; and (ii) if such a process is to account for operonformation, then such genes should also be more prevalentin operons than expected by chance.
We currently have no prokaryotic species in which thereis a large quantity of protein complex data and experi-mentally resolved operon structures available. However,in Helicobacter pylori there exists a large body of yeasttwo-hybrid protein-interaction data [15] in addition tooperon structures that were computationally predicted[16]. We found that the genes that encode interactingproteins reside next to each other in the genome more oftenthan expected by chance (P , 0.001). Furthermore, of the22 pairs of such genes, 18 of them are contained in the
Figure 1. The relative increase in the number of essential and non-essential gene
pairs in the same functional category y genes away from each other along the
chromosome. The relative increase is defined as (O 2 E)/E, where O and E are the
observed and expected numbers of functionally related gene pairs, respectively.
The expected numbers were derived from the average of 1000 sets with random-
ized gene order. Overlapping neighboring gene pairs, genes with unknown dis-
pensability or function and tandem duplicates were excluded from the analysis.
The Mantel–Haenszel procedure [17] was employed to calculate an overall prob-
ability for departures from equal frequency of gene pairs within the same func-
tional category among essential and non-essential gene pairs across contingency
tables from different physical distances. The frequencies of gene pairs within the
same functional categories were compared in 2 £ 2 contingency tables. The Man-
tel–Haenszel test provides a summary chi-square test for the stratified data. Over-
all, essential gene pairs have a higher possibility to encode proteins within the
same functional category than non-essential gene pairs (Mantel-Haenszel test
x2 ¼ 133.1, df ¼ 1, P ! 107).
TRENDS in Genetics
Essential pairsNon-essential pairs
Physical distance measured in number of genes (y)
Rel
ativ
e in
crea
se i
n nu
mbe
r of
fun
ctio
nally
rel
ated
pai
rs
−2
0
4
6
8
10
12
14
2
16
1 3 5 7 9 11 13 15 17 19
Update TRENDS in Genetics Vol.20 No.6 June 2004 233
www.sciencedirect.com
same putative operon. These data are certainly suggestivebut the definitive test will require more reliable sources ofprotein-interaction data (the yeast two-hybrid method hasa high false-positive rate) and experimentally confirmedoperons.
AcknowledgementsWe thank Andrea Navratil and Balazs Papp for discussions, and ananonymous referee for the helpful suggestions.
References
1 Lawrence, J. (1999) Selfish operons: the evolutionary impact of geneclustering in prokaryotes and eukaryotes. Curr. Opin. Genet. Dev. 9,642–648
2 Lawrence, J.G. and Roth, J.R. (1996) Selfish operons – horizontaltransfer may drive the evolution of gene clusters. Genetics 143,1843–1860
3 Syvanen, M. and Kado, C.I. (1998) Horizontal Gene Transfer, KluwerAcademic Publisher
4 Gerdes, S.Y. et al. (2003) Experimental determination and system levelanalysis of essential genes in Escherichia coli MG1655. J. Bacteriol.185, 5673–5684
5 Salgado, H. et al. (2001) RegulonDB (version 3.2): transcriptionalregulation and operon organization in Escherichia coli K-12. NucleicAcids Res. 29, 72–74
6 Rogozin, I.B. et al. (2002) Connected gene neighborhoods in prokar-yotic genomes. Nucleic Acids Res. 30, 2212–2223
7 Lathe, W.C. et al. (2000) Gene context conservation of a higher orderthan operons. Trends Biochem. Sci. 25, 474–479
8 Tatusov, R.L. et al. (2003) The COG database: an updated versionincludes eukaryotes. BMC Bioinformatics 4, 41
9 Serres, M.H. et al. (2004) GenProtEC: an updated and improvedanalysis of functions of Escherichia coli K-12 proteins. Nucleic AcidsRes. 32, D300–D302
10 Omelchenko, M.V. et al. (2003) Evolution of mosaic operons byhorizontal gene transfer and gene displacement in situ. GenomeBiol. 4, R55
11 Ussery, D. et al. (2001) Genome organisation and chromatin structurein Escherichia coli. Biochimie 83, 201–212
12 Papp, B. et al. (2003) Dosage sensitivity and the evolution of genefamilies in yeast. Nature 424, 194–197
13 Shapiro, L. and Losick, R. (1997) Protein localization and cell fate inbacteria. Science 276, 712–718
14 Dandekar, T. et al. (1998) Conservation of gene order: a fingerprint ofproteins that physically interact. Trends Biochem. Sci. 23, 324–328
15 Rain, J.C. et al. (2001) The protein–protein interaction map ofHelicobacter pylori. Nature 409, 211–215
16 Moreno-Hagelsieb, G. and Collado-Vides, J. (2002) A powerful non-homology method for the prediction of operons in prokaryotes.Bioinformatics 18 (Suppl. 1), S329–S336
17 Sokal, R. and Rohlf, M. (1995) Biometry, 3rd edn, Freeman, SanFrancisco
0168-9525/$ - see front matter q 2004 Elsevier Ltd. All rights reserved.doi:10.1016/j.tig.2004.04.001
Articles of interest in Trends and Current Opinion journals
High-throughput phenomics: experimental methods for mapping fluxomes
Uwe Sauer
Current Opinion in Biotechnology 15, 58–63
Links between DNA replication and recombination in prokaryotes
Peter McGlynn
Current Opinion in Genetics and Development 14, 107–112
Chromosome segregation and genomic stability
Viji M. Draviam, Stephanie Xie and Peter K. Sorger
Current Opinion in Genetics and Development 14, 120–125
Disguising adult neural stem cells
Cindi M. Morshead and Derek van der Kooy
Current Opinion in Neurobiology 14, 125–131
Embryonic stem cells: potential for more impact
Jennifer H. Elisseeff
Trends in Biotechnology 22, 155–156
Searching for genetic influences on normal cognitive ageing
Ian J. Deary, Alan F. Wright, Sarah E. Harris, Lawrence J. Whalley and John M. Starr
Trends in Cognitive Sciences 8, 178–184
Hereditary neutropenia: dogs explain human neutrophil elastase mutations
Marshall Horwitz, Kathleen F. Benson, Zhijun Duan, Feng-Qian Li and Richard E. Person
Trends in Molecular Medicine 10, 163–170
p53: 25 years after its discovery
Lorne J. Hofseth, S. Perwez Hussain and Curtis C. Harris
Trends in Pharmacological Sciences 25, 177–181
Update TRENDS in Genetics Vol.20 No.6 June 2004234
www.sciencedirect.com