28
Comparative Genomics Comparative Genomics Virulence in E. coli Virulence in E. coli Diversity of Genomes Diversity of Genomes How Many Genomes are There? How Many Genomes are There? Different Genome Perspectives Different Genome Perspectives

Comparative Genomics Virulence in E. coli Diversity of Genomes How Many Genomes are There? Different Genome Perspectives

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

Comparative GenomicsComparative Genomics

Virulence in E. coliVirulence in E. coli

Diversity of GenomesDiversity of Genomes

How Many Genomes are There?How Many Genomes are There?

Different Genome PerspectivesDifferent Genome Perspectives

Virulence in E. coliVirulence in E. coli

1997- Fred Blattner lab at UWis sequenced 1997- Fred Blattner lab at UWis sequenced E. coli K12 strainE. coli K12 strain

2001- sequenced pathogenic strain O157:H72001- sequenced pathogenic strain O157:H7 This strain causes hemorrhagic colitis which This strain causes hemorrhagic colitis which

affects 75,000 people each yearaffects 75,000 people each year Genome has 5.5 Mb instead of 4.6 MbGenome has 5.5 Mb instead of 4.6 Mb Has 1.3 Mb of “O-islands” not found in K12, Has 1.3 Mb of “O-islands” not found in K12,

K12 has .5 Mb of “K-islands” not found in K12 has .5 Mb of “K-islands” not found in O157:H7 (1387 and 528 genes, respectively)O157:H7 (1387 and 528 genes, respectively)

Island GenesIsland Genes Many of the O157:H7 unique genes are predicted Many of the O157:H7 unique genes are predicted

to be virulence genes, including toxins, metabolic to be virulence genes, including toxins, metabolic pathways, transporters, and adhesion molecules.pathways, transporters, and adhesion molecules.

K-12, however, also have genes in these K-12, however, also have genes in these categories but the strain is not virulent.categories but the strain is not virulent.

A striking difference between O-islands and K-A striking difference between O-islands and K-islands is their base compositions, which differ islands is their base compositions, which differ from that of the backbone.from that of the backbone.

Many of the island genes have orthologs in other Many of the island genes have orthologs in other species and viruses and may have resulted from species and viruses and may have resulted from horizontal transfer.horizontal transfer.

Chi-square AnalysisChi-square AnalysisHow to tell if base compositions, such as those How to tell if base compositions, such as those associated with O- and K- islands really are associated with O- and K- islands really are different from the norm.different from the norm.

BaseBase Seq 1Seq 1 Seq 2Seq 2 TotalTotal

AA 1,0001,000 600600 1,6001,600

CC 1,0001,000 800800 1,8001,800

GG 1,0001,000 700700 1,7001,700

TT 1,0001,000 900900 1,9001,900

TotalTotal 4,0004,000 3,0003,000 7,0007,000

Hypothesis: the base Hypothesis: the base composition is equalcomposition is equal

22 = 35.32 = 35.32

ObservedObserved ExpectedExpected (O - E)(O - E)22 (O - E)(O - E)22/E/E

1,0001,000 914.3914.3 7344.57344.5 8.038.03

1,0001,000 1028.61028.6 818.0818.0 .80.80

1,0001,000 971.4971.4 818.0818.0 .80.80

1,0001,000 1085.71085.7 7344.57344.5 8.038.03

600600 685.7685.7 7344.57344.5 8.038.03

800800 771.4771.4 818.0818.0 .80.80

700700 728.6728.6 818.0818.0 .80.80

900900 814.3814.3 7344.57344.5 8.038.03

Differences Between Two Differences Between Two StrainsStrains

Virulence may be due to genes on the “O-Virulence may be due to genes on the “O-islands” or to differences between shared islands” or to differences between shared genesgenes

Although they share 75% of their DNA, Although they share 75% of their DNA, only 25% of their genes are identicalonly 25% of their genes are identical

The rest have at least 1 base differenceThe rest have at least 1 base differenceWhile this amount of difference is small, it While this amount of difference is small, it can mean the difference between healthy can mean the difference between healthy individuals and those with sickle-cell individuals and those with sickle-cell anemia or cystic fibrosisanemia or cystic fibrosis

460 Genomes, and counting…460 Genomes, and counting… The more genomes we sequence, the wide The more genomes we sequence, the wide

diversity of these genomes becomes more diversity of these genomes becomes more evident.evident.

These genomes range in size from .5-10 Mb and These genomes range in size from .5-10 Mb and in GC content from 25-75%. These seem to in GC content from 25-75%. These seem to correlate, since GTP and CTP take more energy correlate, since GTP and CTP take more energy to make.to make.

One trend is that stable niches tend to One trend is that stable niches tend to accommodate small genomes while volatile accommodate small genomes while volatile environments do not.environments do not.

One thing that remains fairly constant is coding One thing that remains fairly constant is coding capacity, prokaryotes all have about 1 gene/kb.capacity, prokaryotes all have about 1 gene/kb.

Circular Prokaryotic ChromosomesCircular Prokaryotic Chromosomes Another thing we have learned are that not all Another thing we have learned are that not all

prokaryotic chromosomes are circular.prokaryotic chromosomes are circular. 3 distantly related groups of bacteria have 3 distantly related groups of bacteria have

linear chromosomes that seem to have linear chromosomes that seem to have evolved independently.evolved independently.

In regards to chromosome #, some confusion In regards to chromosome #, some confusion exists whether particular pieces of DNA are exists whether particular pieces of DNA are chromosomes or plasmids.chromosomes or plasmids.

Two criteria are used to define a Two criteria are used to define a chromosome:chromosome:

1)1) Does it contain essential genes?Does it contain essential genes?2)2) Does it contain ribosomal genes?Does it contain ribosomal genes?

Genomes are Constantly ChangingGenomes are Constantly Changing The size of a genome may change rapidly due The size of a genome may change rapidly due

to horizontal transfer or fusing of genomes.to horizontal transfer or fusing of genomes. The cost of replicating additional DNA must be The cost of replicating additional DNA must be

balanced with the benefit of having genes that balanced with the benefit of having genes that may lend a selective advantage.may lend a selective advantage.

If the cell evolves to fill a new niche, losing If the cell evolves to fill a new niche, losing unused genes may be advantageous.unused genes may be advantageous.

Most bacteria in similar niches have similar Most bacteria in similar niches have similar sized genomes. Gut bacteria, for instance, sized genomes. Gut bacteria, for instance, have genomes in the 4-5 Mb range.have genomes in the 4-5 Mb range.

How Many Genomes are There?How Many Genomes are There?

Experimental ProceduresExperimental Procedures•1,500 liters of surface water was 1,500 liters of surface water was collected 7 times from 4 different sites collected 7 times from 4 different sites around the sea.around the sea.

•This was passed through filters which This was passed through filters which trapped particles between .1 and 3 trapped particles between .1 and 3 m.m.

•Collected cells were lysed and their Collected cells were lysed and their DNA cut into <1 kb pieces which were DNA cut into <1 kb pieces which were then cloned.then cloned.

•Genomic DNA was extracted from the Genomic DNA was extracted from the filters and subjected to shotgun filters and subjected to shotgun sequencing.sequencing.

Results:Results:•About 1 million separate sequences were About 1 million separate sequences were obtained, totaling 1.6 billion base pairs of DNAobtained, totaling 1.6 billion base pairs of DNA

•At least 1,412 different rRNA genes are At least 1,412 different rRNA genes are represented in this sample, including 148 which represented in this sample, including 148 which are new to the database.are new to the database.

•Using 6 other genes for comparison, a range of Using 6 other genes for comparison, a range of 341-569 phylotypes (ie. species) were sampled 341-569 phylotypes (ie. species) were sampled (including 12 complete genomes).(including 12 complete genomes).

•As the cost of sequencing DNA continues to As the cost of sequencing DNA continues to drop, this approach may become the “next wave” drop, this approach may become the “next wave” of research into biodiversityof research into biodiversity

Sampling ProblemsSampling Problems One problem with this method is that favors One problem with this method is that favors

more abundant species. The coverage for a more abundant species. The coverage for a particular gene in an abundant species is better particular gene in an abundant species is better and a greater number of genes/species exist.and a greater number of genes/species exist.

53% of all DNA from sample #1 were from two 53% of all DNA from sample #1 were from two genera: genera: ShewanellaShewanella & & BurkholderiaBurkholderia. This is a . This is a mystery since the former prefers nutrient-rich mystery since the former prefers nutrient-rich water and the latter is usually terrestrial.water and the latter is usually terrestrial.

Calculations to correct for lost species estimate Calculations to correct for lost species estimate that 1,800 different species may have been that 1,800 different species may have been present.present.

New Genes DiscoveredNew Genes Discovered A total of 1.2 million genes were characterized in A total of 1.2 million genes were characterized in

this study, including 70,000 novel ones.this study, including 70,000 novel ones. Bacteriorhodopsin was one popular gene family, Bacteriorhodopsin was one popular gene family,

previous sampling using PCR had uncovered 67 previous sampling using PCR had uncovered 67 homologs, but this study found 782 new ones.homologs, but this study found 782 new ones.

13 families of bacteriorhodopsin were 13 families of bacteriorhodopsin were characterized, from a wider range of bacteria characterized, from a wider range of bacteria than previously thought.than previously thought.

One must keep in mind that this data was One must keep in mind that this data was

collected using 1.5 x 10collected using 1.5 x 1033 ll of water, while the of water, while the

ocean’s estimated volume is 1.37 x 10ocean’s estimated volume is 1.37 x 101515 ll..

Families Families of of Bacterio-Bacterio-rhopsinrhopsin

Different Genome PerspectivesDifferent Genome Perspectives

What you see using comparative genomics What you see using comparative genomics depends on what perspective you take.depends on what perspective you take.

Zooming out, from small to large, we get:Zooming out, from small to large, we get:

1)1) amino acidsamino acids

2)2) genesgenes

3)3) gene familiesgene families

4)4) segments of chromosomessegments of chromosomes

5)5) whole chromosomeswhole chromosomes

Out with the Old, In with the NewOut with the Old, In with the New One group decided to look at proteomes at the One group decided to look at proteomes at the

amino acid level. Instead of worrying about the amino acid level. Instead of worrying about the proteins encoded, the researchers identified proteins encoded, the researchers identified amino acids that were identical in 2 distantly amino acids that were identical in 2 distantly related species but different in 2 closely related related species but different in 2 closely related species. This focuses on evolutionary drift.species. This focuses on evolutionary drift.

One pattern was seen: amino acids predicted to One pattern was seen: amino acids predicted to be among the 1st incorporated into the genetic be among the 1st incorporated into the genetic code are decreasing, while those predicted to code are decreasing, while those predicted to be newer are increasing in frequency. This is be newer are increasing in frequency. This is true across all 3 domains of life.true across all 3 domains of life.

Figure 3.4Figure 3.4

Gene Family LevelGene Family Level A German group led by Svante Pääbo studied A German group led by Svante Pääbo studied

the evolution of olfactory receptor (OR) genes in the evolution of olfactory receptor (OR) genes in 19 primates + mouse.19 primates + mouse.

They plotted the number of OR pseudogenes in They plotted the number of OR pseudogenes in each species studied.each species studied.

New World monkeys clustered around 18% New World monkeys clustered around 18% pseudogenes, while Old World monkeys had pseudogenes, while Old World monkeys had around 30%. Humans had >50% pseudogenes.around 30%. Humans had >50% pseudogenes.

The one exception is the howler monkey, which The one exception is the howler monkey, which seems out of place. Interestingly, all Old World seems out of place. Interestingly, all Old World monkeys see in 2 colors, with the exception of monkeys see in 2 colors, with the exception of the howler monkey, which sees in 3 colors like the howler monkey, which sees in 3 colors like New World monkeys.New World monkeys.

Whole Chromosome LevelWhole Chromosome Level Evan Eichler at Case Western Reserve examined Evan Eichler at Case Western Reserve examined

human chromosome 7, looking for recombination human chromosome 7, looking for recombination hot spots. There were a total of 27, 12 on the hot spots. There were a total of 27, 12 on the short arm (p) and 15 on the long arm (q).short arm (p) and 15 on the long arm (q).

A team of researchers mapped the recombination A team of researchers mapped the recombination events that have produced syntenic regions in events that have produced syntenic regions in human, mouse, rat, and dog.human, mouse, rat, and dog.

CTVM is a genetic disease in dogs that leads to CTVM is a genetic disease in dogs that leads to thickened heart valves, it has been mapped to thickened heart valves, it has been mapped to canine chromosome 9. This region is syntenic canine chromosome 9. This region is syntenic with chromosome 17 in humans.with chromosome 17 in humans.

Dot Plots Dot Plots of of Recom-Recom-binationbination

Comparing 4 ChromosomesComparing 4 Chromosomes

When all 4 chromosomes (dog, human, mouse & When all 4 chromosomes (dog, human, mouse & rat) are compared simultaneously, colored lines rat) are compared simultaneously, colored lines are used to highlight the recombinational are used to highlight the recombinational hotspots, with shaded regions showing the 2 hotspots, with shaded regions showing the 2 large human recombined areas. large human recombined areas.

Crossing lines show inversions, while bent lines Crossing lines show inversions, while bent lines that do not cross show translocations.that do not cross show translocations.

The site of recombination, as well as gene loss, The site of recombination, as well as gene loss, is often conserved across species. Highly is often conserved across species. Highly repetitive DNA is often involved in recombinationrepetitive DNA is often involved in recombination

Most Recent Common Ancestor Chromosomes Most Recent Common Ancestor Chromosomes can be Constructed using recombination data.can be Constructed using recombination data.