38
Causes of insertion sequences abundance in prokaryotic genomes? A problem of size Marie Touchon E.P.C Rocha Atelier de BioInformatique, Université Pierre et Marie Curie, Paris Unité Génétique des Génomes Bactériens, Institut Pasteur, Paris [email protected]

Causes of insertion sequences abundance in prokaryotic genomes? A problem of size

  • Upload
    aspen

  • View
    29

  • Download
    0

Embed Size (px)

DESCRIPTION

Causes of insertion sequences abundance in prokaryotic genomes? A problem of size. Marie Touchon E.P.C Rocha Atelier de BioInformatique, Université Pierre et Marie Curie, Paris Unité Génétique des Génomes Bactériens, Institut Pasteur, Paris [email protected]. IS elements : - PowerPoint PPT Presentation

Citation preview

Page 1: Causes of insertion sequences abundance  in prokaryotic genomes? A problem of size

Causes of insertion sequences abundance in prokaryotic genomes?

A problem of size

Marie Touchon

E.P.C Rocha

Atelier de BioInformatique, Université Pierre et Marie Curie, Paris

Unité Génétique des Génomes Bactériens, Institut Pasteur, Paris

[email protected]

Page 2: Causes of insertion sequences abundance  in prokaryotic genomes? A problem of size

IS elements :

the simplest form of transposable elements

- 700 to 2500 bp

- coding only the information allowing their mobility

ability to generate mutations :

- by insertion within genes

- by activate genes on insertion upstream

- to generate extensive DNA rearrangements

have been found to shuttle the transfer of adaptive traits such as :

- antibiotic resistance

- virulence

- new metabolic capabilities

Their exact nature is still debated : Selfish/Advantageous?

- genomic parasites

- beneficial agents

Page 3: Causes of insertion sequences abundance  in prokaryotic genomes? A problem of size

Causes of insertion sequences abundance in prokaryotic genome ?

Reasons largely unknown and widely speculated

Hypotheses :- IS family specificity- Genome size- Frequency of horizontal gene transfer - Pathogenicity- Type of ecological associations- Human sedentarisation

The current availability of hundreds of genomes renders testable many of these hypotheses.

Page 4: Causes of insertion sequences abundance  in prokaryotic genomes? A problem of size

IS elements Identification :

Problem : ISs annotations are heterogeneous, inaccurate or insufficient

Solution : Reannotation of ISs using comparative study

by adopting the nomenclature defined by Chandler (1998)

- ISs have one or two consecutive ORFs encoding transposase protein

- ISs are grouped into 21 distinct families

Page 5: Causes of insertion sequences abundance  in prokaryotic genomes? A problem of size

ISs Reannotation

All annotated CDS Genome x

(1)(1) ISs CDS DetectionISs CDS Detection

ISs DatabaseChandler et al.

IS21 IS3

IS1

IS1A-IS21A-IS21B-IS1B IS1A-IS3A-IS3B-IS1A IS1A-IS1B

(2)(2) IS elements reconstitution

IS1 IS1

(3)(3) ISs complete or partial

ISs fragments (> 20% of difference length)

ISs with internal insertion Partial elements

Page 6: Causes of insertion sequences abundance  in prokaryotic genomes? A problem of size

ISs Reannotation - Reassessment

Annotated ISs CDS

Decteted ISs CDS

8823(89%)

2115(22%)

1194 (11%)

Shigella flexneri

Number of Annotated ISs CDS

Num

ber

of

Dete

cted ISs

CD

S

262 genomes(1)(1)

Y = 0.77 (0.02) X + 5.86 ( 1.89)

R2 = 0.81 (P< 0.0001)

R = 0.95 (P< 0.0001)

8123 ISs elements

83% are complete (may be active)

(2)(2)

(3)(3)

Only 20% (1994) of Genbank ISs had a consistent classification

Page 7: Causes of insertion sequences abundance  in prokaryotic genomes? A problem of size

Nu

mb

er

of

Gen

om

es

Distribution of ISs in 262 genomes

Shigella sonnei ( proteobacteria)

Bordetella pertussis ( proteobacteria)

Sulfolobus solfactaricus (archaebacteria)Bacillus haludorans (firmicute)Nitrobacter winogradskyi ( proteobacteria)

The absence of ISs is not anecdotic24% genomes lack IS48% genomes [0-10] ISs

High variability of the number of ISs / Genomeof the number of ISs families / Genome Number of ISs families

Num

ber

of

IS

s

Page 8: Causes of insertion sequences abundance  in prokaryotic genomes? A problem of size

Association with phylogenetic inertia

Rapid dynamic of gain and lossThe number of ISs evolve so fast, that

there is no historical correlation

Page 9: Causes of insertion sequences abundance  in prokaryotic genomes? A problem of size

The effect of IS family specificity

100%

90%

Incongruent phylogenetic treesHigh diversity of ISs found within strains or closely related species

Firmicute ; Proteo ; Proteo

Entero

Page 10: Causes of insertion sequences abundance  in prokaryotic genomes? A problem of size

Pseudomonas syringae tomato

Pseudomonas syringae syringae

Pseudomonas syringae pv. phaseolicola

10 IS342 IS523 IS2140 IS6610 IS111113 ISNCY 1 IS91

14 IS3 1 IS5

1 IS66

1 IS110 1 IS630

7 IS343 IS5 7 IS21 2 IS66 1 IS1111 1 ISNCY 3 IS91

52 IS256

= 139 ISs = 18 ISs = 116 ISs

+ +

The effect of IS family specificity : Examples

This effect is unlikely to explain the variability of ISs

Page 11: Causes of insertion sequences abundance  in prokaryotic genomes? A problem of size

The effect of genome size

Wilcoxon test : p<0.0001 Spearman’s r=0.63, p<0.0001

Strong association between Genome size and IS number (and density)

The larger the genome, the more IS elements it contains

N= 64 198

Page 12: Causes of insertion sequences abundance  in prokaryotic genomes? A problem of size

The effect of horizontal gene transfer

Strain A

specific region

Lists of orthologs

Strain A B C

A Bi jPutative orthologs: Reciprocal best hits, proteins with >90% similarity and <20% length difference.

Strain specific region:Exclusive region to a strainwhich presented at leastten consecutive genes withoutan orthologs

Strain Specific region

Prophage-Database (Nestle, Casjeans, 2003)

HGT-Database (Garcia-Vallve,2003)

E. Coli O157:H7 Sakai

Page 13: Causes of insertion sequences abundance  in prokaryotic genomes? A problem of size

The effect of horizontal gene transfer

Wilcoxon test : p<0.0001

5.2%

11.4%

t-test : p<0.001

ISs are ~ 4 times more concentrated

in HGT regions

Genomes lacking ISs have fewer HGT

Spearman’s r= 0.31 p>0.1 (NS)

HGT may be a determinant of the

presence of ISs, but not of its abundance

Page 14: Causes of insertion sequences abundance  in prokaryotic genomes? A problem of size

Spearman’s r=0.84, p<0.0001

The effect of horizontal gene transfer

HGT is a necessary but not sufficient condition to the presence of ISs

The intensity of HGT is not a significant determinant of the IS abundance

IS families diversity in HGT regions is almost as high as in

the entire genome

Page 15: Causes of insertion sequences abundance  in prokaryotic genomes? A problem of size

The effect of pathogenicity

Yersinia pestis (plague)

Shigella flexneri, sonnei (dysentery)

Bordetella pertussis (whooping cough)

4.33.6

Wilcoxon test : p>0.5

N = 100 153

IS=0 8% 17% 55% 100%

Wilcoxon test : p<0.001

No association between the

presence of IS and pathogenicity

Strong association between the frequency of IS and the facultative

character of the ecological associations

Page 16: Causes of insertion sequences abundance  in prokaryotic genomes? A problem of size

The effect of the type of ecological association

Stepwise multiple regression

Genome size

Ecological association

Frequency HGT

0.4

0.47

0.47

Number of ISs

Covariate Cumulative R2

Genome size is the most important

variable

Kruskal-Wallis test : p>0.5 (NS)

We removed genomes lacking IS(possibly under sexual isolation)

Lifestyles is a non-significant

determinant

Page 17: Causes of insertion sequences abundance  in prokaryotic genomes? A problem of size

The effect of human sedentarisation (Mira et al.,2006)

1) Genomes with many ISs are from prokaryotes associated with humans or domesticated animals and plants.

2) Large intra-genomic IS expansions are recent.

Kruskal-Wallis test : p>0.5 (NS)

not directlyindirectly

No evidence that man-related prokaryotes have more Iss.

Page 18: Causes of insertion sequences abundance  in prokaryotic genomes? A problem of size

Genome size explains ˜ 40% of the variance in IS abundance

The smallest the genome, the lower the number but also the lower density of ISs

- Selection could favor small genomes : optimal use of resources; the replication time (an increase in genome size caused by IS could be counter-selected)

- ISs are selected to generate genetic variation : (such selection should be stronger in larger genomes)

Genomes with fewer ISs, correspond to the slowest growing prokaryotes

Wilcoxon test : p<0.05

De

nsi

ty o

f IS

s (/

Mb

)

fast slow

Growth

Page 19: Causes of insertion sequences abundance  in prokaryotic genomes? A problem of size

tranposition inactivates genes with high probability

the total number of essential genes : ˜300

+ 200-300 genes are nearly ubiquitous

The abundance of IS elements in genomes could be mostly a question of space for not highly deleterious

transposition events

500 nearly essential genes

- Selection against transposition in genomes with higher density of deleterious transposition targets

One explanation fits well the available data

Page 20: Causes of insertion sequences abundance  in prokaryotic genomes? A problem of size

Conclusions

High diversity of ISs found within strains or closely related species

The number of ISs evolve so fast, that there is no historical correlation

HGT may be a determinant of the presence of ISs, but not of its abundance

Surprisingly, genome size alone is the best predictor of IS number and density

Selection against transposition in genomes with higher density of deleterious

transposition targets

Page 21: Causes of insertion sequences abundance  in prokaryotic genomes? A problem of size

Bordetella bronchiseptica

Bord

ete

lla p

ara

pert

uss

is

Impacts of IS abundance?

IS expansion :

- increases the rate of genome rearrangements

- increases the number of pseudogenes Number of ISs

% o

f br

eakp

oint

s co

inci

de w

ith I

S

observed

expectedO

/E

R

ge

ne

/inte

rge

ne

Number of ISs

Page 22: Causes of insertion sequences abundance  in prokaryotic genomes? A problem of size

Acknowledgements

E.P.C Rocha

A. Danchin

Institut Pasteur

La Région Ile de France

Page 23: Causes of insertion sequences abundance  in prokaryotic genomes? A problem of size

Nitrobacter winogradskyi Shigella sonnei

Examples

37 IS332 IS527 IS630 2 IS2114 IS481 4 ISNCY

107 IS3157 IS1 16 IS630 33 IS4 25 IS21 1 IS66 1 IS91 18 IS110 3 IS605 3 IS1111 4 ISAs1 2 ISNCY

= 117 ISs = 372 ISs

Pseudomonas syringae syringae

14 IS3 1 IS5 1 IS630 1 IS66 1 IS110

= 18 ISs

Page 24: Causes of insertion sequences abundance  in prokaryotic genomes? A problem of size

Large Repeats decrease genome stability

(Rocha, Trends Genetics, 03)

Sta

bili

ty

density of repeats

Association with stability ?

Page 25: Causes of insertion sequences abundance  in prokaryotic genomes? A problem of size

Number of ISs

Sta

biliy

But not ISs elements ?

Page 26: Causes of insertion sequences abundance  in prokaryotic genomes? A problem of size

The number of ISs evolve so fast, that there is no historical

correlation

Association with phylogenetic inertia ?

Page 27: Causes of insertion sequences abundance  in prokaryotic genomes? A problem of size

+IS

acquisition

+IS

expansion

-IS

deletion

lineage loss

+I

S

+I

S

Two scenariosgenomic

parasites beneficial agents

Page 28: Causes of insertion sequences abundance  in prokaryotic genomes? A problem of size

Burkholderia pseudomallei 36 Facultative pathogenBurkholderia mallei 152 Obligatory pathogen

Escherichia coli K12 52 CommensalShigella flexneri 298 Obligatory pathogen

Bordetella bronchiseptica 2 Facultative pathogenBordetella pertussis 247 Obligatory pathogen

Association with lifestyle ?

Link with lifestyle

host restriction, niche change, ..

Page 29: Causes of insertion sequences abundance  in prokaryotic genomes? A problem of size

Association with recent rearrangements ?

Yersinia pseudotuberculosis Yersinia pseudotuberculosis

Yers

inia

pest

is

Yers

inia

pest

is

Bordetella bronchiseptica Bordetella bronchiseptica

Bord

ete

lla p

ara

pert

uss

is

Bord

ete

lla p

ara

pert

uss

is

IS expansion promoted frequent

genomic rearrangements

Number of ISs

% o

f b

reakp

oin

ts

coin

cid

e w

ith

IS

observedexpected

Page 30: Causes of insertion sequences abundance  in prokaryotic genomes? A problem of size

B. bronchiseptica B. bronchiseptica

B.

pert

uss

isE. coli K12 E. coli K12

S.

Ente

rica

typhym

uri

um

S. enterica typhymuriumS.

ente

rica

ente

rica

sero

var

thyphi

Shig

ella

flexeneri

99% similarity 99% similarity 90% similarity

99% similarity99% similarity

Bord

ete

lla p

ara

pert

uss

is

IS expansion increases the rate of genome rearrangements

Association with recent rearrangements ?

32

IS

s

24

7

ISs

Page 31: Causes of insertion sequences abundance  in prokaryotic genomes? A problem of size

A B

IS

Or1

Or1’

Or2

Or2’Intergenic

region

B

Or1’

Or2’

A

Or1

Or2

A B

IS

Or1 Or1’

Or2

Or2’

Number of ISs in genes

Number of ISs in intergenes

Association with pseudogenes ?

Page 32: Causes of insertion sequences abundance  in prokaryotic genomes? A problem of size

Association with pseudogenes ?

IS expansion increases the number of pseudogenes

Number of ISs

O/E

R p

seu

do

R pseudo = Number of ISs in genes-----------------------------

Number of ISs in intergenes

Page 33: Causes of insertion sequences abundance  in prokaryotic genomes? A problem of size

+IS

+IS

-ISacquisiti

onexpansio

n

deletion

lineage loss

High variability :

- of the number of ISs / Genome

- of the number of ISs families / Genome

- of the number of ISs copies / Family

IS have been recenlty acquired (HGT)

IS expansion :

- is associated with lifestyle/niche change

- increases the rate of genome rearrangements

- increases the number of pseudogenes

Conclusions

Page 34: Causes of insertion sequences abundance  in prokaryotic genomes? A problem of size

ISs are frequent but not all ubiquitous

ISs number and families varie a lot

Lack of association of the stability with the number of ISs

The presence of ISs is associated with lifestyle

beneficial agents

IS expansion increases the rate of genome rearrangements

IS expansion increases the number of pseudogenes

genomic parasites

Conclusions

Page 35: Causes of insertion sequences abundance  in prokaryotic genomes? A problem of size

High variability of the number of ISs / Genomeof the number of ISs families / Genome

Nu

mb

er

of

Gen

om

es

Nu

mb

er

of

Gen

om

es

Number of ISs families

Number of ISs

Number of ISs families

How many IS ?N

um

ber

of

Gen

om

es

Nu

mb

er

of

Gen

om

es

Page 36: Causes of insertion sequences abundance  in prokaryotic genomes? A problem of size

ISs families

Log

(Nu

mb

er

of

ISs/G

en

om

e)

112-108 : IS1126-124 : IS334-22 : IS4

157 : IS1106 : IS333 : IS425 : IS21

16 : IS110229 : IS481

Number of ISs families

N

um

ber

of

ISs

B. pertussisS. sonnei

S. flexneri

High variability of the number of ISs families / Genomeof the number of ISs / Family

How many IS ?

Page 37: Causes of insertion sequences abundance  in prokaryotic genomes? A problem of size

Hypothesis I

IS induce short spikes of instability which are averaged out in a deep phylogenetic analysis

Page 38: Causes of insertion sequences abundance  in prokaryotic genomes? A problem of size

Hypothesis II

Invasions of highly replicative IS lead to deleterious instability and lineage loss