75
Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison [email protected]

Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison [email protected]

Embed Size (px)

Citation preview

Page 1: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Microbial genome analysis and comparisons

Dave Baumler

Genome Center of Wisconsin, UW-Madison

[email protected]

Page 2: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Today’s session overview:Introduction

Module #1) Microbial genomes at NCBI (http://www.ncbi.nlm.nih.gov/Class/minicourses/)

-familiarize the tools and options using the NCBI tutorial “Microbial genomes Quickstart”, learn how to download genome (.gbk) files

Module #2) Conduct genome alignments of phage genomes

-using Mauve to conduct whole genome alignments, familiarize yourself with Mauve

Module #3) Compare genomes from 3 outbreaks of E. coli O157:H7

-identify genomic islands using Mauve & conservation of virulence factors

Module #4) Compare genomes from 5 strains of Yersinia pestis

-identify genomic islands, conservation of virulence factors, analyze mutations with phenotypic consequences due to insertion and/or deletion events and Single nucleotide polymorphisms (SNP’s), and paleomicrobiology

Conclusion

Page 3: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Choose one of the two Problems:

#1 Escherichia coli O157:H7 strain Sakai

#2 Rickettsia prowazekii strain Madrid E

Page 4: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Download full genome sequence (.gbk) files

Lists of all complete and in progress microbial genomes

Page 5: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

#1) Look for the largest .gbk file which is the main genome, smaller .gbk files are plasmids

#2) Double click on the file

#3 From the file pull down choose “Save page as” give the file a name with a .gbk at the end

Downloading Microbial Genome Files

Page 6: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Links to other E. coli and database and/or resources

Page 7: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Brief information about the organism

Page 8: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Overview with links to assorted tools

Page 9: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Search on page for words using the Edit>>Find in this page pulldown

Page 10: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Entrez protein view

Page 11: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

COG link

Clusters of Orthologous Groups of proteins (COGs) were delineated by comparing protein sequences encoded in complete genomes, representing major phylogenetic lineages. Each COG consists of individual proteins or groups of paralogs from at least 3 lineages and thus corresponds to an ancient conserved domain.

Page 12: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Geneplot

Entrez Genome offers a new pairwise comparison tool called GenePlot to visualize similarities among bacterial genomes. Support for fungal genomic comparisons is also planned. To construct a GenePlot, genes are numbered sequentially along the genomic sequences of two organisms and the two corresponding sets of predicted proteins are compared using BLAST. For every case in which a pair or proteins, one from each genome, are mutual best matches, a point is plotted using the indices of the equivalent gene in the two genomes as the X and Y coordinates. Use the GenePlot link from an organism’s genome record to see a GenePlot against the organism with which it shares the highest number of reciprocal best hits. Comprisons between other organisms can be made using pull-down menus.

Page 13: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

TaxMap

Page 14: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Comparisons of COG groups between various organisms

Page 15: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Insect Pathogens /Endosymbionts

-Arsenophonus

-Buchnera

-Sodalis

-Wigglesworthia

-Xenorhabdus

Human Pathogens

-Calymmatobacterium

-Cedecea

-Citrobacter

-Edwardsiella

-Enterobacter

-Escherichia

-Ewingella

-Hafnia

-Klebsiella

-Kluyvera

-Leclercia

-Leminorella

-Moellerella

-Morganella

-Plesiomonas

-Proteus

-Providencia

-Rahnella

-Salmonella

-Serratia

-Shigella

-Tatumella

-Yersinia

-Yokenella

Environmental/

Animals/Industrial

-Alterococcus

-Budvicia

-Buttiauxella

-Obesumbacterium

-Pragia

-Trabulsiella

Phytopathogens/

Plant-associated

-Brenneria

-Dickeya

-Erwinia

-Pantoea

-Pectobacterium

-Phlomobacter

-Sacchararobacter

-Samsonia

The ERIC database houses all of the available genomes of the members of family Enterobacteriaceae

Boxes, represent organisms with at least one genome sequenced

Page 16: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

OrthologsIf at least two of these criteria are met for the pair of genes in question they are typically assigned as orthologs.

•Percentage identity and alignment percentage are in the typical range

•Local genome context, the conserved gene is part of an operon with other genes that are already considered orthologs.

•Larger scale conservation of genomic context, the conserved gene is in the same general genomic context as other orthologs.

•Functional conservation, the conserved gene is predicted or known to perform the same function as the potential ortholog in another genome.

BlastP

BlastP

X Y

YX

Reciprocal Best Blast hits

>60%

>60%

Page 17: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Enterobacteria cont.

Generated from 180 orthologs

Page 18: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

ERIC-Enteropathogen Resource Integration Center

GenomesTools & Annotations

Genome Views and Comparisons

Page 19: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Part of a genome sequenceTCAGCGAAGATGAGATAGTTTTTAAAGGTGGGATTTCCCCACCTTTAAAAAGCGAGAAGTCCCGGTTTTAAAGAGGAGTAAAATCCTCTTTTTCTAGCCCACTCAGGTGGTTTTTTTGGTTTTCGCTCCTTGCCGCATCTTCTGTGCCTTTGATGGCGGCTGGTTGGGGTGAAAGGCTGCATATTCCAGAATTTCAGACAGTAGATTGTTTTTGAAATCTTCCGTTTTATCGTTGACGAACTTAACCATCCTGTTGAAATCATCTTCCTTTGATACACCTTCAGGAAATGCCTTAGGAACTGATGTTTGGCTATCCAAGGCATCTTGCAATATCTGCACGATCTCCGAATTCATTGATCGCCCATTGGCCTTTGCTCTGGCGGCAACTGCGTCACGCATACCGTCAGGCATCCTAACTGTAAATCTCTCAATGAAAGCTGGATCTTCTTTTTCAGTCATCATCTTAAACCATAAAAATTTATACAAAACACACTAGCATCATATTGACATTACCCACAATGACATCATAATGGTGTCAGGCATCAAAATGATGTCATCATGACAAGGGGAAAGTAAATGCAAGATGTTCTCTATACAGGTCGTAAGAACGACAGCTTTCAGCTTCGTCTGCCTGAGCGAATGAAAGAAGAGATCCGTCGCATGGCAGAGATGGACGGCATTTCGATTAATTCTGCAATCGTGCAGCGCCTTGCTAAAAGCTTGCGTGAGGAAAGAGTTAATGGGCAGTAAAAACAGCGAAGCCCGGAAGTGTGGGGACACTAACCGGGCTTCTAATGTCAGTTACCTAGCGGGAAACCAACAATGACCAGTATAGCAATCTTTGAAGCAGTAAACACTATCTCTCTTCCATTCCACGGACAGAAGATCATAACTGCGATGGTGGCGGGTGTGGCGTATGTGGCAATGAAGCCCATCGTGGAAAACATCGGTTTAGACTGGAAGAGCCAGTATGCCAAGCTCGTTAGTCAGCGTGAAAAGTTCGGGTGTGGTGATATCACCATACCTACCAAAGGTGGTGTTCAGCAGATGCTTTGCATCCCTTTGAAGAAACTGAATGGATGGCTCTTCAGCATTAACCCAGCAAAAGTACGTGATGCAGTTCGTGAAGGTTTAATTCGCTATCAAGAAGAGTGTTTTACAGCTTTGCACGATTACTGGAGCAAAGGTGTTGCAACGAATCCCCGGACACCGAAGAAACAGGAAGACAAAAAGTCACGCTATCACGTTCGCGTTATTGTCTATGACAACCTGTTTGGTGGATGCGTTGAATTTCAGGGGCGTGCGGATACGTTTCGGGGGATTGCATCGGGTGTAGCAACCGATATGGGATTTAAGCCAACAGGATTTATCGAGCAGCCTTACGCTGTTGAAAAAATGAGGAAGGTCTACTGATTGGCGTATTGGAAGGCGCAAAAAGAAAAGCCAGCAGATGGGCTGCTGGCATTCATTGGGTATATGAACTTTCGGAGAACATATGAAGTCAATTATCAAGCATTTTGAGTTTAAGTCAAGTGAAGGGCATGTAGTGAGCCTTGAGGCTGCAAGCTTTAAAGGCAAGCCAGTTTTTTTAGCAATTGATTTGGCTAAGGCTCTCGGGTACTCAAATCCGTCA

Page 20: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Genome annotation is the process of attaching biological information to sequences. It consists of two main steps:

1.-identifying elements on the genome, a process called “structural annotation” or “gene finding”

1.-attaching information to these elements such as their molecular and biological functions.

What exactly are gene annotations?

Page 21: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Structural annotation consists of the identification of genomic elements (e.g. genes).

•Open Reading Frames (ORFs) also called coding sequences (CDSs) must have a start codon and a stop codon

•location of regulatory motifs (such as promoters and ribosome binding sites)

•This step is typically automated using gene prediction software (Automation only finds ~50-90% of the genes)

Annotation step #1: Structural Annotation

Example of a gene - the start codon is green and the stop codon is red

Page 22: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Annotation step #1: Structural Annotation (cont.) using Genemark.hmm a statistical model

Page 23: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Functional annotation: consists in attaching biological information to genomic elements.

•biochemical function•involved regulation and interactions•expression•cellular location

Three examples of annotations for one gene:

•Name/synonym: a short “word” used to refer to the gene (Ex. ureC)

•Product: a descriptive protein name (Ex. Urease gamma subunit)

•Function : Describes what the protein does (Ex. Catalyzes the hydrolysis of urea to form ammonia and carbon dioxide)

Annotation step #2

Page 24: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Module #2 Conduct genome alignments of phage genomes

-this module is developed to teach how to use Mauve using enterobacteria phage

-Phage genomes can be aligned using Mauve in a matter of minutes.

-applicable as a teaching tool to decipher the mosaicism of phage genomes.

-comparative studies of 30 mycobacteriophage genomes reveal new insights into the diverse architecture and insight about gene exchange

(Hatfull et al. PLoS genetics et al. 2006)

-using Mauve, you could align EVERY mycobacteriophage genome available

-How diverse are enterobacteriophage?

(the following series of slides are Mauve alignments of phage isolated from E. coli, Salmonella spp., Yersinia spp., and Shigella spp.) all alignments are also provided for further inquiry

-we will run alignments with 3 phage genomes from E. coli O157:H7

Page 25: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Mauve: Multiple Genome Aligner

• Able to identify and align collinear regions of multiple genomes even in the presence of rearrangements

• Find and extend seed matches

• Group into locally collinear blocks

• Align intervening regions

(Darling et al. Genome Res. 2004 Jul;14(7):1394-403.)

Page 26: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Module #2 Understanding phage, the viruses that infect microorganisms, via genome alignments

Recently aligned 56 enterobacterial phage, phage genomes are an ideal training tools for teaching how to set up mauve alignments

Page 27: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Why Phage? Genomics timeline

Phage

X174 10 genes

1977 1982

Phage

46 ge

nes

1995 1996 1997 1998 2000 2001 2008

Haemop

hilus

influ

enza

1,709

Sacch

arom

yces

cerev

isiae

6,269

E. coli

MG165

5 4,200

Caenor

habdit

is ele

gans

19,000

Droso

philia m

elanog

aster

13,00

0

Human

s ~30

-40,0

00

E. coli

EDL93

3 5,20

0

643 C

omplet

e micr

obial

genomes

& 970 in

progress

Page 28: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Step #1 copy the folder called 3 phage genomes for alignment excercise, and paste it on the hard drive of your computer (C: drive)

Step #2 from the start menu, in programs select Mauve 2.1.1

Step #3 under the File pull down select Align with progressive Mauve

This new window will appear

#4 click here to choose where to send the output file, find the folder (from Step#1), and double click on the folder

#5 Type in a file name, and click on Save

Page 29: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Next add the sequences to align

Click on Add sequence

Select the first phage genome and click on Open, then continue with the 2nd and 3rd phage genomes. Then click on Align to start the genome alignment

Page 30: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

When viewing the LCB’s, mauve displays regions that are highly conserved/identical as full color.

Areas that are unique/variable to one genome appear in white, and represent unique islands

Page 31: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Your tool bar is at the top on the left, the tools you will use are in the View pulldown, and also the buttons

Search for featuresZoom in/out, you

can also hold down the ctrl button and use the arrows on the keyboard

Move left or right, you will find this useful to center a region of interest in the middle of the screen prior to zooming in

Returns the viewer back to home

Page 32: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Other useful commands in Mauve

Function Key

Zoom in Ctrl+Up

Zoom out Ctrl+Down

Scroll Left Ctrl+Left

Scroll Right Ctrl+Right

Export the current view as Ctrl+E

An image

Page 33: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Module #3) Dissecting virulence of E. coli O157:H7 using genome alignments

Page 34: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

-determination of the complete E. coli sequence required almost 6 years

-E. coli is the preferred model in biochemical genetics, molecular biology, and biotechnology and its genomic characterization will undoubtedly further research toward a more complete understanding of this important experimental, medical, and industrial organism

(Blattner et al. Science 1997)

The first E. coli genome sequenced was the non-pathogenic E. coli K-12 genome MG1655

Page 35: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

(Perna et al. Nature 2001)

-In 1982 Escherichia coli O157:H7 recognized as a pathogen for human disease

-Also known as EDL933 from the Michigan outbreak in 1982 from ground beef

-shiga toxin producing (STEC)

The first pathogenic E. coli genome sequence was enterohaemorrhagic (EHEC) Escherichia coli O157:H7

strain 933 EDL

Page 36: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

(Hayashi et al. DNA Res 2001)

-In July 1996, an outbreak of Escherichia coli O157:H7 infection occurred among schoolchildren in Sakai City, Osaka, Japan.

-8,938 schoolchildren sickened, 3 deaths

- We are starting to ask-What genomic differences determine differences in virulence, epidemiology, and fatality?

The completion of the 2nd E. coli O157:H7 (EHEC) sequence strain Sakai

Page 37: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

In 2006 E. coli O157:H7 outbreak from bagged spinach(from CDC)

-multistate outbreak

205 people sickened, 3 deaths

Page 38: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Currently there are 13 E. coli O157:H7 Genomes sequenced, we will have you focus on three that are all in the

Enteropathogen Resource Integration Center (ERIC) database (www.ericbrc.org)

The three strains you will focus on are:

Escherichia coli EDL933 (EHEC)

Escherichia coli Sakai (EHEC) also called RIMD

Escherichia coli EC4042 (EHEC)

Page 39: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

In your start menu under programs go to Mauve 2.1.1, start up Mauve, notice there is a users guide in pdf form in this folder, this will contain useful information and commands to navigate

Note: your computer may need to update Java, since mauve uses a Java platform for the alignment.

You should see a window for Mauve appear

Page 40: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Next double click on the uncompressed 3 O157H7 folder, it should contain the following 19 files, take the first one (3 O157 alignment), and drag and drop it into the mauve window

It should start to say reading sequences here, and in a few seconds the alignment will appear, note computers with less than 512MB RAM may not be able to open the file

Page 41: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Your alignment should look like this

Organism name notice the first is EDL933, the second is RIMD(Sakai), and the third is EC4042 (spinach)

Using the up or down arrows, you can switch the position of the genomes

Page 42: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

The colored blocks are called local colinear blocks (LCB’s), and represent regions of the genome that Mauve has identified as conserved, the lines connect the LCBS, notice that some are in different positions in the other genomes, some are inverted and appear on the bottom strand of the double stranded genome

Top strand

Bottom strand

Page 43: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

When you move your mouse over a region of one genome it will show a black box and also show the corresponding region (boxes) in the other two genomes, try scrolling left to right on one genome

Page 44: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Notice, that when you scroll (slowly) over a white region (island) the black boxes pause in the other genomes, then comes back once you have passed over the island and back into conserved regions

Page 45: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

If you would like to look at all three LCB’s, even though one is in a different position, scroll over one LCB and click the mouse button

Page 46: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Lets use the zoom function, press the home button to restore the alignment to original view

Now click on the white island in the top genome, and using the right button bring it to the center of the screen, now start to zoom in multiple times

You will start to see the genes, scroll over one and pause, and a window will pop-up with the product annotation, so here you can view what genes are present in this EDL933 island, and not in the other two

Page 47: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Now place you mouse over one of the genes, in my example I have iha irgA homolog adhesion

Click your mouse once on the gene, and a window will pop-up, scroll down and select View CDS iha in ERICdb

This will open the page in the ERIC database for that gene, containing all of the annotations, you can look to see if it is involved in virulence

Page 48: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Lets use the search feature

#1) Click on the search feature

#2) Choose a genome (EDL933)

#3) Type in a gene name (stx2A)

#4) Click on search

Page 49: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Notice that it has found the stx2A gene (highlighted in blue), and also in the RIMD strain. Just because it isn't aligned in the EC4042 strain does not mean it isn't there, if you look to the right in the EC4042 genome, you will find it

Stx2A

Page 50: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

One last feature you can use in Mauve To find an island that is in 2 out of 3 strains you will use the backbone view

Press the home button first

Then go to the View pull down select color scheme then backbone color

Page 51: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Your alignment should look like this in backbone color, regions in all three appear in light purple color, there will be regions that are different colors that will correspond to 2 out of 3 genomes (you may have to zoom in a bit to see these regions

Regions in only EDL933 and RIMD appear olive green

Regions in only EDL933 and EC4042 appear maroon

Regions in only RIMD and EC4042 appear tan/brown

This is how you identify islands unique to 2/3 strains

Page 52: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Using genomics to track the dissemination of

Yersinia pestis strains

Courtesy of www.cdc.gov

Deng et al. 2002 J. Bacteriol. 184:16 4601-4611

Page 53: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Transmission cycle of Plague

Page 54: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Historic 3 pandemics of plague

-pandemic: is defined as an epidemic that spreads throughout the human population across a large region such as a continent or worldwide

-1st pandemic ~550 A.D. confined to mainly Africa and some parts of the middle ease

-2nd pandemic originated in Central Asia and spread via trading routes into Europe (Killed ~30% of Europe population)

-3rd pandemic started in 1850’s in China’s Yunnan providence century confined mainly to Asia

Courtesy of edsitement.neh.gov

Page 55: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

The first two genomes of Yersinia pestis CO92 & KIM

Parkhill et al. 2001 Nature 413, 523-527 Deng et al. 2002 J. Bacteriol. 184:16 4601-4611

Comparison of 2 genomes was not interactive initially

Page 56: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

As of 04/2008 there are 7 complete and 14 Y. pestis draft genomes

Traditionally the strains are classified as serovars (Antiqua, Mediaevalis, Orientalis, and other) based on the following phenotypic characteristics:

-Antiqua = East Africa: (glycerol positive, arabinose positive, and nitrate positive)

-Mediaevalis = Central Asia: (glycerol positive, arabinose positive, and nitrate negative)

-Orientalis Central Asia (glycerol negative, arabinose positive, and nitrate positive)

 -other (ie Microtus, Pestoides) not consistent for these phenotypes

Page 57: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Partial view of the grave in Dreux investigated in this work, which illustrates anthropologic features of a mass grave suitable for paleomicrobiology research. (courtesy of www.cdc.gov)

Paleomicrobiology

-the prefix paleo comes from the Greek work palaios meaning “ancient”

-bacterial colonization of dental pulp can occur during bacteremia

-Bacteremia (also known as plague septicaemia with Y. pestis) is the presence of bacteria in the blood Courtesy of www.nidcr.nih.gov

Page 58: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Figure 1   The original protocol developed in our study allows recovering the dental pulp and minimizes the risk of laboratory-acquired contamination of the specimen. The tooth was encasted into sterile resin (1a) ; the apex was sterily sectioned (1b) to give access to the canal system (1c) ; solutions were injected (1d) ; after incubation, the tooth was put upside down into sterile tube (1e) and centrifuged (1f).

Tran-Hung et al. PLoS ONE v.2(10); 2007

Extraction of bacterial DNA from Dental pulp

-Some historians believed that a flu-like virus and not Y. pestis was responsible for the 1st and 2nd pandemics

-DNA detected in dental pulp confirm that Y. pestis was the cause

-Which serovar(s) are most similar to the Y. pestis strain(s) from the dental pulp from the corpses?

Page 59: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Use of genomic tools to study Y. pestis

Concepts in this module that you will address:

#1) mutations that affect the production of a full functional gene product that has phenotypic consequences (insertions, deletions, single nucleotide polymorphisms [SNP’s]) to study the genes glpD, napA, and araC

#2) Paleomicrobiology investigation, determine which serovar(s) have the most similar matching genes compared to the amplified sequence from the dental pulp of 3 corpses.

#3) use of genome alignments; determine a island that is unique to the 4 genomes that infect humans and is absent in Y. pestis strain 91001

#4) determine the conservation of a virulence factor in the 5 strains in the genome alignment. Determine if it is a full functional product in strain 91001.

Page 60: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Next double click on the uncompressed Yersinia pestis alignment 5 genome folder, it should contain the following 29 files, take the one (yersinia_pestis_alignment_5genomes), and drag and drop it into the mauve window

It should start to say reading sequences here, and in a few seconds the alignment will appear, note computers with less than 512MB RAM may not be able to open the file

Page 61: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Your alignment should look like this

Organism name notice the first is CO92, the second is KIM,the third is 91001, the fourth is Antiqua, and the fifth is Nepal516

Using the up or down arrows, you can switch the position of the genomes

Page 62: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

You may find it easier to view the 5 genome alignment without the connecting lines:

on your keyboard press Shift L (pressing this again makes them reappear)

Page 63: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Now place you mouse over one of the genes,

Click your mouse once on a gene, and a window will pop-up, scroll down and select View CDS in ERICdb

This will open the page in the ERIC database for that gene, containing all of the annotations, you can look to see what is known about it and/or if it is involved in virulence (note you may be prompted to a log-in screen, click on the button that says “Enter ASAP”)

Page 64: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Lets use the search feature to find the genes glpD, napA, and araC

#1) Click on the search feature

#2) Choose a genome or search all of the genomes

#3) Type in a gene name (glpD)

#4) Click on search

Page 65: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Notice that it has found the glpD gene (highlighted in blue), and also a corresponding gene in each genome. You need to determine which of the five CDS’s produce the full-length functional protein

Method #1: click on each gene and go to the view CDS in ERICdb, look at the length and if any are labeled as pseudogenes. If so look for a note that describes why it is thought to be a pseudogene

Page 66: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Identifying mutations in glpD, napA, and araC cont.

Method #2: from the feature page in ERIC

Scroll down to the feature context part of the page

This is a list of all features that are neighboring your gene in the genome, notice some are upstream, downstream, or contained within

Notice that contained within your glpD gene there are polymorphic sites (otherwise known as SNP’s)

For SNP analysis, you will use a new tool called “Snippy”

Page 67: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

In a new tab or web browser window go to http://asap.ahabs.wisc.edu/~cabot/aep/snippy.php

It should look like this:

Highlight and copy all feature ID’s for polymorphic sites from glpD and paste them into here and click submit

feature ID’s

Page 68: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

In your SNP analysis, you want to look for SNP’s that cause a change in the amino acid that it encodes for. In some cases the change results in a premature stop-codon, which may generate a truncated non-functional protein

#1) note Snippy shows you if the SNP variation results in a amino acid change, in this case A (Alanine) to T (Threonine)

#2) In this second SNP, the change resulted in a stop codon

Page 69: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

In the middle of each region you will see the polymorphic site (in this case capitol G’s) and the corresponding base in each genome, note you are interested in variations in YPKIM, YPCO92, YP91001, YPNepal, and YpAntiqua.

-in this case there is no difference in these 5 genomes in this analysis, scroll down and search the remaining polymorphic sites and see if there is any difference in the various polymorphic sites in the 5 genomes, if not it probably is a larger deletion or insertion event

Page 70: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Using the DNA sequence obtained from the dental pulp from three corpses (found in the file called Ypestis corpse and CA88-4125YPE genes.doc), conduct a BlastN search within the ERIC database with each sequence against the 91001,Nepal, Kim, Antiqua, and CO92 genomes. For each of the three corpses, which serovar is most similar to the strains that caused the 1st and 2nd pandemics?

From the ERIC home page you can select to run a Blast search here

(http://www.ericbrc.org/)

Page 71: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Paste the first nucleotide sequence from corpse #1

Select entire genomes

Select the genomes to query, hold down the Ctrl key and select Y . pestis genomes 91001, Antiqua, CO92, KIM, and Nepal

Finally click on the Submit Query button, repeat with the other two corpses sequences

Page 72: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Next repeat the BlastN process using the gene sequences from a known North American ancestor (Y. pestis CA88-4125/YPE) for glpD, napA, and araC. Of the 5 genomes (91001, Antiqua, CO92, KIM, and Nepal) representing the three serovars, which is most similar to the known North American ancestor?

Based on your analysis did Y. pestis arrive in North America via shipping routes over the Atlantic or Pacific?

Atlantic?

(Serovar Antiqua of African origin)

Pacific?

Serovar Orientalis or Mediaevalis of Asian origin

Courtesy of education.usgs.gov

Page 73: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Your alignment should look like this in backbone color, regions in all five appear in light purple color, there will be regions that are different colors that will correspond to 2, 3, 4 out of 5 genomes (you may have to zoom in a bit to see these regions)

Look for a region in the lightest blue color that is present in CO92, KIM, Antiqua, and Nepal, but absent in the 91001 strain. Analyze the contents and determine if any of the genes may contribute to human infection of Y. pestis.

Page 74: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

ConclusionIf you are interested in using some or all of these modules in your class, please sign up, and provide email, institution, course(s)

-In the last two weeks of August 2008 I will be leading multiple WebX training sessions to refresh and field Q&A, you need a telephone and internet-ready computer

Page 75: Microbial genome analysis and comparisons Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Thanks for your timeCollaborators:

Dr. Kai F. (Billy) Hung (UW-Madison/assistant Prof. At Eastern Illinois University Fall 2008)

Dr. Amy C. Wong (UW-Madison)

Dr. Lois Banta (Williams College)

Mentors:

Dr. Nicole Perna (UW-Madison)

Dr. Charles Kaspar (UW-Madison)

Dr. Jeffrey Byrd (St. Mary’s College)

Dr. Bob Kadner and the ASM Summer Institute

Thank you: everyone on the ERIC database team (especially Guy Plunkett III for setting up module #1 & Eric Cabot for making Snippy) and all of the members of the Perna Genome Evolution Laboratory

Funding: This project has been funded with Federal funds from the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human services, under contract No. HHSN266200400040C