3
Evidence against the selfish operon theory Csaba Pa ´l 1,2 and Laurence D. Hurst 2 1 MTA, Theoretical Biology Research Group, Eo ¨ tvo ¨ s Lora ´ nd University, Pa ´ zma ´ ny Pe ´ ter Se ´ ta ´ ny 1/C, Budapest, H-1117, Hungary 2 Department of Biology and Biochemistry, University of Bath, Bath, Somerset BA2 7AY, UK According to the selfish operon hypothesis, the cluster- ing of genes and their subsequent organization into operons is beneficial for the constituent genes because it enables the horizontal gene transfer of weakly selected, functionally coupled genes. The majority of these are expected to be non-essential genes. From our analysis of the Escherichia coli genome, we conclude that the selfish operon hypothesis is unlikely to provide a general explanation for clustering nor can it account for the gene composition of operons. Contrary to expec- tations, essential genes with related functions have an especially strong tendency to cluster, even if they are not in operons. Moreover, essential genes are particu- larly abundant in operons. There is an increasing realization that gene order is not random and that linked genes tend to share expression characteristics. Nowhere is this more striking than in bacterial operons, in which functionally related genes often cluster. Although it is tempting to suppose that operon evolution came about by the selection for co- expression, this logic has been challenged [1,2]. Co- regulation of genes in operons can provide selection for the maintenance of operon structure; however, it is not clear how it can explain the assembly of gene clusters by gradual steps because no benefit is expected to be derived from proximity until co-transcription is possible [1]. The selfish operon hypothesis posits an alternative set of selective conditions that can potentially address this concern. The hypothesis postulates that the linkage of two or more functionally related genes is favored because it increases the probability that genes will be co-transferred during horizontal gene transfer. The model posits [2] that operons generally consist of genes that can only together fulfill a given function. If the function is under weak selection or required for specific conditions, the genes in the operon can be lost easily by a combination of mutation pressure and genetic drift but can be regained by horizontal gene transfer at a later stage. Given that an upper limit on the length of donor DNA segment exists [3], the closer in proximity the genes are, the higher the possibility that they can be regained in one step. The theory therefore predicts that ‘genes for essential processes should not cluster’ [2]. At first sight the theory is not about operons but is about the clustering of genes. However, the majority of the data applied in support of the theory relates to operons [2] and, as is clear from the title of the hypothesis, the developers posit that the model is relevant to the evolution of operons. It predicts, they argue, that bacterial genomes should be ‘interspersed with novel, horizontally transferred operons providing periph- eral metabolic functions’ [2]. (The authors define periph- eral as meaning non-essential [2].) Here we report the results of a whole-genome analysis to find out if this hypothesis has the power to explain both the patterns of clustering and the patterns of gene content of operons in Escherichia coli. First, we ask whether non- essential genes are relatively enriched within operons compared with essential genes. This is an important issue because the authors of this hypothesis have stated that ‘by contrast [to the selfish operon model], the co-regulation model predicts that essential genes whose co-regulation is most critical are those most likely to be found in operons’ [2]. Given that it has already been established that essen- tial genes cluster in the E. coli genome [4], we next ask whether the extent of gene clustering, in a given functional category, is more pronounced for non-essential genes than for essential genes. Testing the hypothesis The classification of E. coli K12 genes as either essential or nonessential was taken from a recent systematic study [4]. The list of essential genes was augmented with a collection of gene deletion studies that was retrieved from the profiling of E. coli chromosome (PEC) database (http:// www.shigen.nig.ac.jp/ecoli/pec/index.jsp). Information on the operons was retrieved from RegulonDB [5] (http:// www.cifn.unam.mx/Computational_Genomics/regulondb/), a database that was compiled from the literature on the regulation of transcription in E. coli K12. Consistent with the co-regulation model, essential genes tend to occur in operons compared with non-essential genes. From the 3445 genes with appropriate data on gene dispensability, 602 genes were designated as essential by at least one study. Approximately 28% of these are known to be in operons, whereas this figure is reduced to 23% among the non-essential genes (x 2 ¼ 6.73, df ¼ 1, P ¼ 0.009). To allow for possible bias in the operon dataset, we repeated the analysis to include the operons that were predicted computationally. The trend remains the same (x 2 ¼ 6.57, df ¼ 1, P ¼ 0.01), suggesting that essential genes have a slightly higher tendency to reside in operons. Although the results described here provide no obvious support for the selfish operon theory, one could argue that operon formation is only one possible but unnecessary outcome ofthe clusteringoffunctionally related genes. There is strong evidence for the physical clustering of function- ally related genes that are unrelated to operons [6,7]. Corresponding author: Laurence D. Hurst ([email protected]). Update TRENDS in Genetics Vol.20 No.6 June 2004 232 www.sciencedirect.com

Evidence against the selfish operon theory

Embed Size (px)

Citation preview

Page 1: Evidence against the selfish operon theory

Evidence against the selfish operon theory

Csaba Pal1,2 and Laurence D. Hurst2

1MTA, Theoretical Biology Research Group, Eotvos Lorand University, Pazmany Peter Setany 1/C, Budapest, H-1117, Hungary2Department of Biology and Biochemistry, University of Bath, Bath, Somerset BA2 7AY, UK

According to the selfish operon hypothesis, the cluster-

ing of genes and their subsequent organization into

operons is beneficial for the constituent genes because

it enables the horizontal gene transfer of weakly

selected, functionally coupled genes. The majority of

these are expected to be non-essential genes. From our

analysis of the Escherichia coli genome, we conclude

that the selfish operon hypothesis is unlikely to provide

a general explanation for clustering nor can it account

for the gene composition of operons. Contrary to expec-

tations, essential genes with related functions have an

especially strong tendency to cluster, even if they are

not in operons. Moreover, essential genes are particu-

larly abundant in operons.

There is an increasing realization that gene order is notrandom and that linked genes tend to share expressioncharacteristics. Nowhere is this more striking than inbacterial operons, in which functionally related genesoften cluster. Although it is tempting to suppose thatoperon evolution came about by the selection for co-expression, this logic has been challenged [1,2]. Co-regulation of genes in operons can provide selection forthe maintenance of operon structure; however, it is notclear how it can explain the assembly of gene clusters bygradual steps because no benefit is expected to be derivedfrom proximity until co-transcription is possible [1]. Theselfish operon hypothesis posits an alternative set ofselective conditions that can potentially address thisconcern. The hypothesis postulates that the linkage oftwo or more functionally related genes is favored because itincreases the probability that genes will be co-transferredduring horizontal gene transfer.

The model posits [2] that operons generally consist ofgenes that can only together fulfill a given function. If thefunction is under weak selection or required for specificconditions, the genes in the operon can be lost easily by acombination of mutation pressure and genetic drift but canbe regained by horizontal gene transfer at a later stage.Given that an upper limit on the length of donor DNAsegment exists [3], the closer in proximity the genes are,the higher the possibility that they can be regained in onestep. The theory therefore predicts that ‘genes for essentialprocesses should not cluster’ [2]. At first sight the theory isnot about operons but is about the clustering of genes.However, the majority of the data applied in support of thetheory relates to operons [2] and, as is clear from the title ofthe hypothesis, the developers posit that the model is

relevant to the evolution of operons. It predicts, theyargue, that bacterial genomes should be ‘interspersed withnovel, horizontally transferred operons providing periph-eral metabolic functions’ [2]. (The authors define periph-eral as meaning non-essential [2].)

Here we report the results of a whole-genome analysisto find out if this hypothesis has the power to explain boththe patterns of clustering and the patterns of gene contentof operons in Escherichia coli. First, we ask whether non-essential genes are relatively enriched within operonscompared with essential genes. This is an important issuebecause the authors of this hypothesis have stated that ‘bycontrast [to the selfish operon model], the co-regulationmodel predicts that essential genes whose co-regulation ismost critical are those most likely to be found in operons’[2]. Given that it has already been established that essen-tial genes cluster in the E. coli genome [4], we next askwhether the extent of gene clustering, in a given functionalcategory, is more pronounced for non-essential genes thanfor essential genes.

Testing the hypothesis

The classification of E. coli K12 genes as either essential ornonessential was taken from a recent systematic study [4].The list of essential genes was augmented with a collectionof gene deletion studies that was retrieved from theprofiling of E. coli chromosome (PEC) database (http://www.shigen.nig.ac.jp/ecoli/pec/index.jsp). Information onthe operons was retrieved from RegulonDB [5] (http://www.cifn.unam.mx/Computational_Genomics/regulondb/),a database that was compiled from the literature on theregulation of transcription in E. coli K12. Consistent withthe co-regulation model, essential genes tend to occur inoperons compared with non-essential genes. From the3445 genes with appropriate data on gene dispensability,602 genes were designated as essential by at least onestudy. Approximately 28% of these are known to be inoperons, whereas this figure is reduced to 23% among thenon-essential genes (x2 ¼ 6.73, df ¼ 1, P ¼ 0.009). To allowfor possible bias in the operon dataset, we repeated theanalysis to include the operons that were predictedcomputationally. The trend remains the same (x2 ¼ 6.57,df ¼ 1, P ¼ 0.01), suggesting that essential genes have aslightly higher tendency to reside in operons.

Although the results described here provide no obvioussupport for the selfish operon theory, one could argue thatoperon formation is only one possible but unnecessaryoutcomeoftheclusteringoffunctionallyrelatedgenes.Thereis strong evidence for the physical clustering of function-ally related genes that are unrelated to operons [6,7].Corresponding author: Laurence D. Hurst ([email protected]).

Update TRENDS in Genetics Vol.20 No.6 June 2004232

www.sciencedirect.com

Page 2: Evidence against the selfish operon theory

However, we found that the clustering of functionallyrelated genes is particularly pronounced for essentialgenes. Using the functional classification of E. coli genesthat was derived from the clusters of orthologous groups ofproteins (COG) database [8] (http://www.ncbi.nlm.nih.gov/COG/), we calculated the number of essential gene pairslocated ‘y’ genes away from each other, and calculated thefrequency of pairs that are in the same broad functionalcategory. We repeated the same procedure for all non-essential gene pairs.

We found that the relative increase in the number ofessential gene pairs with related functions in a givenphysical distance is always higher than the relativeincrease in number of non-essential, functionally relatedpairs (Figure 1). Using different, more detailed functionalclassification [9] does not alter this finding (Mantel–Haenszel test: x2 ¼ 41.2, df ¼ 1, P ! 1027). We have goodreason to believe that this result is independent of operonstructure. First, the size of the clusters appears to belarger than the usual size of operons. More importantly,the trend remains even when gene pairs of the sameoperon are excluded from the analysis (Mantel–Haenszeltest: x2 ¼ 81.03, df ¼ 1, P ! 1027). It has been noted pre-viously that ribosomal proteins tend to be essential andcluster in bacterial genomes [2]. Importantly, the differ-ence in the tendency for functional clustering is not a

peculiarity of ribosomal genes. After excluding all of thegene pairs that were involved in translation, the differencein the clustering tendency between essential and non-essential genes remains (Mantel–Haenszel test:x2 ¼ 12.3, df ¼ 1, P , 0.0005). Unfortunately, it is difficultto investigate the relative contribution of the differentfunctional categories to the patterns observed because theobserved number of essential gene pairs will be too low tobe statistically meaningful (data not shown).

The analyses described here can not be considereddefinitive. One might question, for example, whether wecan really define operons. Similarly, one might conjecturethat some essential genes were not always essential butmight have been transferred horizontally and only subse-quently became essential. However, unless these problemsare substantial, the analyses presented here stronglysuggest that the selfish operon theory does not, for themost part, explain the evolution of operon structure andgene clustering on a larger scale in the E. coli genome. Ourresults are further supported by the finding that, whenmore than one horizontally transferred gene is found in agiven operon, they are often the result of independenttransfer events [10].

Alternative hypotheses

Nonetheless, there remains the issue of how the clusteringof genes originates, how the clusters evolve gradually andwhy operons are most prevalent in bacteria. One estab-lished idea is that chromosome organization (and possiblychromatin formation) has an important role in the timingof gene expression and gene dosage [11].

Another possibility relates to the peculiarity of tran-scription and translation in prokaryotes. In this group, thetranslational process often occurs while the 30 end ofthe mRNA is still being synthesized. This means that theprotein product is manufactured in the vicinity of the gene.We have shown previously that an imbalance in theconcentrations of proteins involved in complexes has amajor effect on fitness in yeast [12]. One might thenimagine that the genes for such proteins might be underselection to be linked, so as to ensure the minimal time inwhich the proteins are not bound together in the complex[13]. Such selection would provide a gradual advantage tolinkage and explain why operons are much more prevalentin prokaryotes than in eukaryotes. This model makes twopredictions: (i) genes in which protein products formcomplexes should be more tightly linked than expected bythe null hypothesis –there is some evidence that this is thecase [14]; and (ii) if such a process is to account for operonformation, then such genes should also be more prevalentin operons than expected by chance.

We currently have no prokaryotic species in which thereis a large quantity of protein complex data and experi-mentally resolved operon structures available. However,in Helicobacter pylori there exists a large body of yeasttwo-hybrid protein-interaction data [15] in addition tooperon structures that were computationally predicted[16]. We found that the genes that encode interactingproteins reside next to each other in the genome more oftenthan expected by chance (P , 0.001). Furthermore, of the22 pairs of such genes, 18 of them are contained in the

Figure 1. The relative increase in the number of essential and non-essential gene

pairs in the same functional category y genes away from each other along the

chromosome. The relative increase is defined as (O 2 E)/E, where O and E are the

observed and expected numbers of functionally related gene pairs, respectively.

The expected numbers were derived from the average of 1000 sets with random-

ized gene order. Overlapping neighboring gene pairs, genes with unknown dis-

pensability or function and tandem duplicates were excluded from the analysis.

The Mantel–Haenszel procedure [17] was employed to calculate an overall prob-

ability for departures from equal frequency of gene pairs within the same func-

tional category among essential and non-essential gene pairs across contingency

tables from different physical distances. The frequencies of gene pairs within the

same functional categories were compared in 2 £ 2 contingency tables. The Man-

tel–Haenszel test provides a summary chi-square test for the stratified data. Over-

all, essential gene pairs have a higher possibility to encode proteins within the

same functional category than non-essential gene pairs (Mantel-Haenszel test

x2 ¼ 133.1, df ¼ 1, P ! 107).

TRENDS in Genetics

Essential pairsNon-essential pairs

Physical distance measured in number of genes (y)

Rel

ativ

e in

crea

se i

n nu

mbe

r of

fun

ctio

nally

rel

ated

pai

rs

−2

0

4

6

8

10

12

14

2

16

1 3 5 7 9 11 13 15 17 19

Update TRENDS in Genetics Vol.20 No.6 June 2004 233

www.sciencedirect.com

Page 3: Evidence against the selfish operon theory

same putative operon. These data are certainly suggestivebut the definitive test will require more reliable sources ofprotein-interaction data (the yeast two-hybrid method hasa high false-positive rate) and experimentally confirmedoperons.

AcknowledgementsWe thank Andrea Navratil and Balazs Papp for discussions, and ananonymous referee for the helpful suggestions.

References

1 Lawrence, J. (1999) Selfish operons: the evolutionary impact of geneclustering in prokaryotes and eukaryotes. Curr. Opin. Genet. Dev. 9,642–648

2 Lawrence, J.G. and Roth, J.R. (1996) Selfish operons – horizontaltransfer may drive the evolution of gene clusters. Genetics 143,1843–1860

3 Syvanen, M. and Kado, C.I. (1998) Horizontal Gene Transfer, KluwerAcademic Publisher

4 Gerdes, S.Y. et al. (2003) Experimental determination and system levelanalysis of essential genes in Escherichia coli MG1655. J. Bacteriol.185, 5673–5684

5 Salgado, H. et al. (2001) RegulonDB (version 3.2): transcriptionalregulation and operon organization in Escherichia coli K-12. NucleicAcids Res. 29, 72–74

6 Rogozin, I.B. et al. (2002) Connected gene neighborhoods in prokar-yotic genomes. Nucleic Acids Res. 30, 2212–2223

7 Lathe, W.C. et al. (2000) Gene context conservation of a higher orderthan operons. Trends Biochem. Sci. 25, 474–479

8 Tatusov, R.L. et al. (2003) The COG database: an updated versionincludes eukaryotes. BMC Bioinformatics 4, 41

9 Serres, M.H. et al. (2004) GenProtEC: an updated and improvedanalysis of functions of Escherichia coli K-12 proteins. Nucleic AcidsRes. 32, D300–D302

10 Omelchenko, M.V. et al. (2003) Evolution of mosaic operons byhorizontal gene transfer and gene displacement in situ. GenomeBiol. 4, R55

11 Ussery, D. et al. (2001) Genome organisation and chromatin structurein Escherichia coli. Biochimie 83, 201–212

12 Papp, B. et al. (2003) Dosage sensitivity and the evolution of genefamilies in yeast. Nature 424, 194–197

13 Shapiro, L. and Losick, R. (1997) Protein localization and cell fate inbacteria. Science 276, 712–718

14 Dandekar, T. et al. (1998) Conservation of gene order: a fingerprint ofproteins that physically interact. Trends Biochem. Sci. 23, 324–328

15 Rain, J.C. et al. (2001) The protein–protein interaction map ofHelicobacter pylori. Nature 409, 211–215

16 Moreno-Hagelsieb, G. and Collado-Vides, J. (2002) A powerful non-homology method for the prediction of operons in prokaryotes.Bioinformatics 18 (Suppl. 1), S329–S336

17 Sokal, R. and Rohlf, M. (1995) Biometry, 3rd edn, Freeman, SanFrancisco

0168-9525/$ - see front matter q 2004 Elsevier Ltd. All rights reserved.doi:10.1016/j.tig.2004.04.001

Articles of interest in Trends and Current Opinion journals

High-throughput phenomics: experimental methods for mapping fluxomes

Uwe Sauer

Current Opinion in Biotechnology 15, 58–63

Links between DNA replication and recombination in prokaryotes

Peter McGlynn

Current Opinion in Genetics and Development 14, 107–112

Chromosome segregation and genomic stability

Viji M. Draviam, Stephanie Xie and Peter K. Sorger

Current Opinion in Genetics and Development 14, 120–125

Disguising adult neural stem cells

Cindi M. Morshead and Derek van der Kooy

Current Opinion in Neurobiology 14, 125–131

Embryonic stem cells: potential for more impact

Jennifer H. Elisseeff

Trends in Biotechnology 22, 155–156

Searching for genetic influences on normal cognitive ageing

Ian J. Deary, Alan F. Wright, Sarah E. Harris, Lawrence J. Whalley and John M. Starr

Trends in Cognitive Sciences 8, 178–184

Hereditary neutropenia: dogs explain human neutrophil elastase mutations

Marshall Horwitz, Kathleen F. Benson, Zhijun Duan, Feng-Qian Li and Richard E. Person

Trends in Molecular Medicine 10, 163–170

p53: 25 years after its discovery

Lorne J. Hofseth, S. Perwez Hussain and Curtis C. Harris

Trends in Pharmacological Sciences 25, 177–181

Update TRENDS in Genetics Vol.20 No.6 June 2004234

www.sciencedirect.com