103
Eukaryotic Genomes: Fungi Monday, November 24, 2011 Genomics 260.605.01 J. Pevsner [email protected]

Eukaryotic Genomes: Fungi Monday, November 24, 2011 Genomics 260.605.01 J. Pevsner [email protected]

Embed Size (px)

Citation preview

Eukaryotic Genomes:Fungi

Monday, November 24, 2011

Genomics260.605.01J. Pevsner

[email protected]

Many of the images in this powerpoint presentationare from Bioinformatics and Functional Genomicsby J Pevsner, copyright © 2009 by Wiley-Blackwell.

These images and materials may not be usedwithout permission.

Visit http://www.bioinfbook.org

Copyright notice

Monday (today): Fungi (Chapter 17)Wednesday 11/16: Next-generation sequencing (Sarah Wheelan)Friday 11/18: Protozoans (David Sullivan)

Monday 11/21: Eukaryotic genomes (Chapter 18)Wednesday 11/23: no classFriday 11/25: Thanksgiving

Schedule

Outline of today’s lecture

Description and classification of fungiThe Saccharomyces cerevisiae genome

SequencingFeatures of the genomeYeast chromosomes

Duplication of the yeast genomeFunctional genomics in yeastComparative genomics of fungi

Summary of key points

[1] S. cerevisiae has 16 chromosomes and ~6,000 genes

[2] Its genome underwent a whole genome duplication followed by massive gene loss

[3] Comparative genomics is a powerful approach--to identify genes--to identify regulatory regions--to infer evolutionary history of genome

duplications and gene gain or loss

[4] SGD (Saccharomyces Genome Database) is the major web resource for yeast

Introduction to fungi: phylogeny

Fungi are eukaryotic organisms that can be filamentous (e.g. molds) or unicellular (e.g. the yeast Saccharomycescerevisiae).

Most fungi are aerobic (but S. cerevisiae can grow anaerobically). Fungi have major roles in the ecosystemin degrading organic waste. They have important rolesin fermentation, including the manufacture of steroidsand penicillin.

Several hundred fungal species are known to causedisease in humans.

Page 698

Eukaryotes(Baldauf et al., 2000)

Fungi and metazoa are sister groups

Fig. 17.1Page 698

Baldauf et al., 2000

Classification of fungi

About 70,000 fungal species have been described (as of 1995), but 1.5 million species may exist.

Four phyla:Ascomycota yeasts, truffles, lichens

Basidiomycota rusts, smuts, mushroomsChytridiomycota AllomycesZygomycota feed on decaying vegetation

Box 17-1Page 699

Classification of fungi

About 70,000 fungal species have been described (as of 1995), but 1.5 million species may exist.

Four phyla:Ascomycota yeasts, truffles, lichens

Hemiascomycetae S. cerevisiaeEuascomycetae NeurosporaLoculoascomycetaeLaboulbeniomycetae parasites of insects

Basidiomycota rusts, smuts, mushroomsChytridiomycota AllomycesZygomycota feed on decaying vegetation

Box 17-1Page 699

Alternate classification of fungi

Outline of today’s lecture

Description and classification of fungiThe Saccharomyces cerevisiae genome

SequencingFeatures of the genomeYeast chromosomes

Duplication of the yeast genomeFunctional genomics in yeastComparative genomics of fungi

Introduction to Saccharomyces cerevisiae

First species domesticated by humans

Called baker’s yeast (or brewer’s yeast)

Ferments glucose to ethanol and carbon dioxide

Model organism for studies of biochemistry,genetics, molecular and cell biology

…rapid growth rate…easy to modify genetically…features typical of eukaryotes…relatively simple (unicellular)…relatively small genome

Page 700

Sequencing the S. cerevisiae genome

The genome was sequenced by a highly cooperative consortium in the early 1990s, chromosome by chromosome(the whole genome shotgun approach was not used).

This involved 600 researchers in > 100 laboratories.

--Physical map created for all XVI chromosomes--Library of 10 kb inserts constructed in phage--The inserts were assembled into contigs

The sequence released in 1996, and published in 1997(Goffeau et al., 1996; Mewes et al., 1997)

Page 701

Features of the S. cerevisiae genome

Sequenced length: 12,068 kb = 12,068,000 base pairs Length of repeats: 1,321 kbTotal length: 13,389 kb (~ 13 Mb)

Open reading frames (ORFs): 6,275 (see updates below) Questionable ORFs (qORFs): 390 Hypothetical proteins: 5,885

Introns in ORFs: 220Introns in UTRs: 15Intact Ty elements: 52tRNA genes: 275snRNA genes: 40

Page 702

Features of the S. cerevisiae genome

A notable feature of the genome is its high gene density(about one gene every 2 kilobases). Most bacteria haveabout one gene per kb, but most eukaryotes have a much sparser gene density.

Also, only 4% of S. cerevisiae genes are interruptedby introns. By contrast, 40% of genes from the fungusSchizosaccharomyces pombe have introns.

What are the most common protein families and proteindomains? You can see the answer at EBI’s website:http://www.ebi.ac.uk/proteome/

Page 701

Page 703

Fig. 17.3Page 703http://www.ebi.ac.uk/proteome/

The EBI website offers a variety of proteome analysis tools, such as this summary of protein length distribution in S. cerevisiae.

ORFs in the S. cerevisiae genome

How are ORFs defined? In the initial genome analysis,an ORF was defined as >100 codons (thus specifyinga protein of ~11 kilodaltons).

390 ORFs were listed as “questionable”, because they were considered unlikely to be authentic genes. For example, they were short, or exhibited unlikely preferences for codon usage.

How many ORFs are there in the yeast genome?There are 40,000 ORFs > 20 amino acids; how many of these are authentic?

Page 703

ORFs in the S. cerevisiae genome

Several criteria may be applied to decide if ORFs are authentic protein-coding genes: [1] evidence of conservation in other organisms [2] experimental evidence of gene expression (microarrays, SAGE, functional genomics)

The groups of Elizabeth Winzeler and Michael Snyder each described hundreds of previously unannotatedgenes that are transcribed and translated.

Page 704

ORFs in the S. cerevisiae genome

The MIPS Comprehensive Yeast Genome Database lists criteria for assigning ORFs, based on FASTAsearch scores:

Number of proteinsCategory 2003 2006 2010Known protein 3400 3289 5311Strong similarity to known protein 230 228 37Weak similarity to known protein 825 818 84Similarity to unknown protein 1007 1003 343No similarity 516 516 227Questionable ORF 472 463 --

Total 6450 6317 6006

Page 704

http://mips.helmholtz-muenchen.de/genre/proj/yeast/

Revising the S. cerevisiae gene count through comparative genomics

By sequencing three additional yeast species (Saccharomyces paradoxus, S. bayanus, S. mikatae), Kellis et al. (Nature 423:241, 2003) showed that 503 genes should be deleted from the set of yeast genes (leaving 5,726 including 43 newly discovered genes).

Comparing the DNA sequences from several species makes it possible to find regulatory regions — short sequences that turn genes on and off — and eliminate spurious gene predictions. Red boxes highlight areas of sequence similarity between at least two species. Functional sequences — genes and regulatory elements — tend to be conserved across all species. The figure shows how one true regulatory element and one correctly identified gene might emerge from a comparison of four yeast species.

Salzberg SL (2003) Nature 423:233

Kellis et al. (2003) Nature 423:241.

The N50 size is the length such that 50% of the assembled genome lies in blocks of the N50 size or longer.

(N50 contig length = 26,260 kb for human reference assembly)

http://genome.jgi-psf.org/help/scaffolds.html

A scaffold is a portion of the genome sequence reconstructed from end-sequenced whole-genome shotgun clones. Scaffolds are composed of contigs and gaps.

Predicted ORFs are shown as arrows pointing in the direction of transcription. Orthologous ORFs are connected by dotted lines and are coloured by the type of correspondence: red for 1-to-1 matches, blue for 1-to-2 matches and white for unmatched ORFs. Sequence gaps are indicated by vertical lines at the ends of contigs, with the estimated size of each gap shown by the length of the hook. See Supplementary Information for 250 such figures tiling the complete S. cerevisiae genome. Kellis et al. (2003) Nature 423:241.

Exploring a typical S. cerevisiae chromosome

We will next familiarize ourselves with the S. cerevisiaegenome by exploring a typical chromosome, XII.

Page 704

Exploring a typical S. cerevisiae chromosome

We will next familiarize ourselves with the S. cerevisiaegenome by exploring a typical chromosome, XII.

This chromosome features• 38% GC content• very little repetitive DNA• few introns• six Ty elements (transposable elements)• a high ORF density: 534 ORFs > 100aa, and 72% of the chromosome has protein-coding genes

Page 704

Key S. cerevisiae databases

Web resources include:

NCBI (Entrez Genome Eukaryotic genome projects)

EBIhttp://www.ebi.ac.uk/proteome/

SGD: Saccharomyces Genome Databasehttp://www.yeastgenome.org

MIPS Comprehensive Yeast Genome Database(MIPS = Munich Information Center for Protein Sequences)http://mips.gsf.de/genre/proj/yeast/

Page 705

MostImportant!

NCBI: Entrez genomes for yeast resources

NCBI: Entrez genomes for yeast resources

Fig. 17.4Page 704updated 11/09

NCBI: Entrez genomes for yeast resources

http://www.ncbi.nlm.nih.gov/genome/guide/saccharomyces/

Fig. 17.6Page 705http://www.yeastgenome.org/

Saccharomyces Genome Database (SGD):primary web resource for yeast genomics

Vast set of resources

S. cerevisiae gene nomenclature

YKL159c

Y = yeastK = 11th chromosomeL = left (or right) arm (relative to centromere)159 = 159th ORFc = Crick (bottom) or w (Watson, top) strand

Box 15-3Page 707

S. cerevisiae gene nomenclature

YKL159c

Y = yeastK = 11th chromosomeL = left (or right) arm159 = 159th ORFc = Crick (bottom) or w (Watson, top) strand

RCN1 = wildtype geneRcn1p = proteinrcn1 = mutant allele

Box 15-3Page 707

Outline of today’s lecture

Description and classification of fungiThe Saccharomyces cerevisiae genome

SequencingFeatures of the genomeYeast chromosomes

Duplication of the yeast genomeFunctional genomics in yeastComparative genomics of fungi

Duplication of the S. cerevisiae genome

Analysis of the S. cerevisiae genome revealed that manyregions are duplicated, both intrachromosomally andinterchromosomally (within and between chromosomes).These duplicated regions include both genes and nongenic regions.

Such duplications reflect a fundamental aspect ofgenome evolution.

What are the mechanisms by which regions of the genomeduplicate?

Page 708

Duplication of the S. cerevisiae genome

Mechanisms of gene duplication

tandem repeatslippageduring

recombination

Geneconversion

Lateralgene

transfer

Segmentalduplication

polyploidye.g.

genometetraploidy

Fig. 17.8Page 708

Duplication of the S. cerevisiae genome

Fate of duplicated genes

Bothcopiespersist

One copy isdeleted

One copybecomes a

pseudogene

One copyfunctionally

diverges

Fig. 17.8Page 708

Duplication of the S. cerevisiae genome

What is the fate of duplicated genes? (see YGOB, below)

A duplicated gene (overall in eukaryotes) has a half lifeof just several million years (Lynch and Conery, 2000).

50% to 92% of duplicated genes are lost (Wagner, 2001)

Consider four possible fates of a duplicated gene:

Page 711

Duplication of the S. cerevisiae genome

What is the fate of duplicated genes?

A duplicated gene (overall in eukaryotes) has a half lifeof just several million years (Lynch and Conery, 2000).

50% to 92% of duplicated genes are lost (Wagner, 2001)

Consider four possible fates of a duplicated gene:[1] Both copies persist (gene dosage effect)

Page 711

Duplication of the S. cerevisiae genome

What is the fate of duplicated genes?

A duplicated gene (overall in eukaryotes) has a half lifeof just several million years (Lynch and Conery, 2000).

50% to 92% of duplicated genes are lost (Wagner, 2001)

Consider four possible fates of a duplicated gene:[1] Both copies persist (gene dosage effect)[2] One copy is deleted (a common fate)

Page 711

Duplication of the S. cerevisiae genome

What is the fate of duplicated genes?

A duplicated gene (overall in eukaryotes) has a half lifeof just several million years (Lynch and Conery, 2000).

50% to 92% of duplicated genes are lost (Wagner, 2001)

Consider four possible fates of a duplicated gene:[1] Both copies persist (gene dosage effect)[2] One copy is deleted (a common fate)[3] One copy accumulates mutations and becomes a pseudogene (no functional protein product)

Page 711

Duplication of the S. cerevisiae genome

What is the fate of duplicated genes?

A duplicated gene (overall in eukaryotes) has a half lifeof just several million years (Lynch and Conery, 2000).

50% to 92% of duplicated genes are lost (Wagner, 2001)

Consider four possible fates of a duplicated gene:[1] Both copies persist (gene dosage effect)[2] One copy is deleted (a common fate)[3] One copy accumulates mutations and becomes a pseudogene (no functional protein product)[4] One copy (or both) diverges functionally. The organism can perform a novel function.

Page 711

Duplication of the S. cerevisiae genome

In 1970, Susumu Ohno published the book Evolution by Gene Duplication.

He hypothesized that vertebrate genomes evolved by two rounds of whole genome duplication. This providedgenomes with the “raw materials” (new genes) with which to introduce various innovations.

Page 709

Duplication of the S. cerevisiae genome

Ohno (1970):

“Had evolution been entirely dependent upon naturalselection, from a bacterium only numerous forms ofbacteria would have emerged. The creation of metazoans,vertebrates, and finally mammals from unicellularorganisms would have been quite impossible, for suchbig leaps in evolution required the creation of new geneloci with previously nonexistent function. Only the cistron that became redundant was able to escape fromthe relentless pressure of natural selection. By escaping,it accumulated formerly forbidden mutations to emergeas a new gene locus.”

Page 709

Duplication of the S. cerevisiae genome

Wolfe and Shields (1997, Nature) provided support forOhno’s paradigm. They hypothesized that the yeast genome duplicated about 100 million years ago.

Originally there was a diploid yeast genome with about 5,000 genes (on 8 chromosomes).

It doubled to a tetraploid number of 10,000 genes (on 16 chromosomes). Then there was massive gene loss and chromosomal rearrangement to yield the present day 6,000 genes.

Page 709

Fig. 17.9Page 710

Distance along chromosome X (kb)

Dis

tan

ce a

lon

g c

hro

mo

som

e X

I (k

b) Wolfe and Shields (1997)

performed blastp and found 55 blocks ofduplicated regions. Theyproposed that the entireS. cerevisiae genomeunderwent a duplication.

Matches with scores >200are shown. These arearranged in blocks of genes.

Duplication of the S. cerevisiae genome

Evidence of genome duplication in yeast-- Systematic BLAST searches show 55 blocks of duplicated sequences.-- There are 376 pairs of homologous genes.

You can see the results of chromosomal comparisonson Ken Wolfe’s web site and at the SGD web site.

Page 710

Duplication of the S. cerevisiae genome

Two models for the presence of duplication blocks

[1] Whole genome duplication (tetraploidy) followed by gene loss and rearrangements

[2] Successive, independent duplication events

Page 711

Duplication of the S. cerevisiae genome

Model [1] is favored for several reasons:

-- For 50 of 55 duplicated regions, the orientation of the entire block is preserved with respect to the centromere. The orientation is not random.

-- For model [2] we would expect 7 triplicated regions. We observe only 0 or 1.

-- Gene order is maintained in 14 hemiascomycetes (the Génolevures project)

Page 711

Duplication of the S. cerevisiae genome

Why are duplicated genes commonly lost? It might seemhighly advantageous to have a second copy of gene,thus permitting functional divergence.

Ohno suggested two reasons:

[1] After duplication, a deleterious mutation in one of the twogenes might now persist. Without duplication, the individual would have been selected against by such a mutation.

[2] The presence of a new paralogous sequence could lead tounequal crossing over of homologous chromosomes during meiosis.

Page 711

Duplication of the S. cerevisiae genome

To consider the fate of duplicated genes, consider theexample of genes involved in vesicle transport.

Vesicles carry cargo from one destination to another.Proteins on vesicles (e.g. vesicle-associated membraneprotein, VAMP; Snc1p in yeast) bind to proteins on targetmembranes (e.g. syntaxin in mammalian and othereukaryotic systems, or Sso1p in yeast).

In S. cerevisiae, genome duplication appears to be responsible for the presence of two syntaxins(SSO1 and SSO2) and two VAMPs (SNC1 and SNC2).

Page 711

Duplication of the S. cerevisiae genome

Sso1p Sso2p

Snc1p Snc2p

Fig. 12.4Page 469

Search for informationon SSO1 (or any yeast gene) at theSGD website

The SGD record for SSO1 provides information on function

Duplication of the S. cerevisiae genome

The SGD website reveals that the SSO1 gene is nonessential(i.e. the null mutant is viable), but the double knockout ofSSO1 and SSO1 is lethal. Thus, these paralogs may offerfunctional redundancy to the organism.

Also, these proteins could participate in distinct (butcomplementary) intracellular trafficking steps.

Page 711

Comparative analyses of hemiascomycetes:Whole genome duplication

You can explore duplicated genome regions using the Yeast Gene Order Browser (YGOB) at:http://wolfe.gen.tcd.ie/ygob/

Kenneth Wolfe offers a website that permits analysisof yeast duplications: http://wolfe.gen.tcd.ie/ygob/

Fig. 17.11Page 713

Yeast Gene Order Browser

Fig. 17.12Page 714

Yeast Gene Order Browser: patterns of gene loss after WGD

Duplication of the S. cerevisiae genome

The Génolevures project:

-- Sequencing of 13 hemiascomycetes-- Gene order can be compared in 14 fungi-- 70% of the S. cerevisiae genome maps to sister regions with only minimal overlap-- Proposal that the 16 centromeres form 8 pairs

Page 712

Duplication of the S. cerevisiae genome

The Génolevures project:

-- Sequencing of 13 hemiascomycetes-- Gene order can be compared in 14 fungi-- 70% of the S. cerevisiae genome maps to sister regions with only minimal overlap-- Proposal that the 16 centromeres form 8 pairs

Phylogenetic analyses place the divergence of S. cerevisiaeand Kluyveromyces lactis prior to the whole genomeduplication (~100 million years ago). Perhaps the genomeduplication enabled S. cerevisiae to acquire new propertiessuch as the capacity for anaerobic growth.

Page 712

It had long been suspected that the genome of the yeast Saccharomyces cerevisiae arose through the duplication of the genome of an ancestral yeast. Three new papers confirm this suspicion. The confirmation involved sequencing the genomes of yeasts such as Kluyveromyces waltii, Ashbya gossypii, K. lactis and Candida glabrata (which served as reference species) and comparing their gene sequences, and their order and orientation, with S. cerevisiae. The genes in grey are in the same order and orientation in the reference species and S. cerevisiae. Some of these genes, those in yellow, have relatives that are found in two different chromosomal locations (copies 1 and 2) in the S. cerevisiae genome. Some, however, have relatives in the same order and orientation only on copy 1 (red), or only on copy 2 (blue). The findings indicate that genome duplication occurred in the lineage that led to S. cerevisiae; some genes were then maintained, while many others were lost or diversified.

A. Goffeau (2004). Nature 430, 25-26

Detecting whole-genome duplications.

André Goffeau (2004). Evolutionary genomics: Seeing double. Nature 430, 25-26

[See figure on next slide.] Model of WGD followed by massive gene loss predicts gene interleaving in sister regions. a, After divergence from K. waltii, the Saccharomyces lineage underwent a genome duplication event, creating two copies of every gene and chromosome. b, The vast majority of duplicated genes underwent mutation and gene loss. c, Sister segments retained different subsets of the original gene set, keeping two copies for only a small minority of duplicated genes, which were retained for functional purposes. d, Within S. cerevisiae, the only evidence comes from the conserved order of duplicated genes (numbered 3 and 13) across different chromosomal segments; the intervening genes are unrelated. e, Comparison with K. waltii reveals the duplicated nature of the S. cerevisiae genome, interleaving genes from sister segments on the basis of the ancestral gene order.

M. Kellis, B.W. Birren and E.S. Lander (2004) Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae. Nature 428, 617-624

common ancestor

Saccharomyces lineage Kluyveromyces lineage

Kellis M et al. (2004) Nature 428, 617-624

Branches show number of substitutions per thousand amino acids. Evolutionary trees are rooted at the divergence of three species. a, Average protein divergence of all 457 gene pairs that arose by WGD. The faster-evolving paralogue is arbitrarily designated as copy 2 for each pair.

Kellis M et al. (2004) Nature 428, 617-624

Branches show number of substitutions per thousand amino acids. Evolutionary trees are rooted at the divergence of three species. b, Example of a gene showing accelerated protein divergence (1 of 72 cases). Ancestral and derived gene function can be inferred by comparison to K. waltii. In this case, the origin-of-replication recognition complex protein Orc1 is inferred to be ancestral and the silencing protein Sir3 is inferred to be derived.

Kellis M et al. (2004) Nature 428, 617-624

c, Example of duplicated gene pairs that have undergone recent gene conversion (1 of 60 cases). Comparison with S. bayanus shows that recent gene conversion events occurred both in S. cerevisiae and in S. bayanus lineages. Dotted lines connect orthologous genes in S. cerevisiae and in S. bayanus.

Kellis M et al. (2004) Nature 428, 617-624

After Kellis M et al. (2004) Nature 428, 617-624

d, Phylogeny and relative time of WGD. Estimated tree lengths are as reported.

Fig. 17.10Page 712

Comparative analyses of hemiascomycetes:Identification of functional elements

Kellis et al. (2003) compared S. paradoxus, S. mikatae, and S. bayanus to S. cerevisiae (divergence dates: 5 to 20 MYA). There were clear orthologous matches, except at the telomeres.

For the Gal4 transcription factor and other functional elements, comparative analyses have helped delineate regulatory regions.

Page 714

Fig. 17.13Page 715

Yeast Gal4 transcription factor binding site:note conserve regions between the genes GAL10, GAL1

Outline of today’s lecture

Description and classification of fungiThe Saccharomyces cerevisiae genome

SequencingFeatures of the genomeYeast chromosomes

Duplication of the yeast genomeFunctional genomics in yeast

Genetic footprintingExogenous transposonsMolecular barcodes

Comparative genomics of fungi

Functional genomics in yeast

Functional genomics refers to the assignmentof function to genes based on genome-widescreens and analyses.

We can consider functional genomics in yeastin terms of high throughput approaches at the levels of genes, transcripts, and proteins

Functional genomics in yeast

Protein levelTwo-hybrid screensAffinity purification and mass spectrometryPathways

RNA levelMicroarraysSAGEtransposon tagging

Gene levelGenetic footprintingTransposon insertion: random mutagenesisGene deletion: targeted deletion of all ORFs!!!

Outline of today’s lecture

Description and classification of fungiThe Saccharomyces cerevisiae genome

SequencingFeatures of the genomeYeast chromosomes

Duplication of the yeast genomeFunctional genomics in yeastComparative genomics of fungi

Today’s final topic: comparative analysis of fungal genomes

The fungi offer unprecedented opportunitiesfor comparative genomic analyses

-- relatively small genome sizes-- they are eukaryotes-- they exhibit significant differences in biology-- opportunities to apply functional genomics approaches in a comprehensive, genome-wide manner

Page 715

Fungal and metazoan phylogeny

Baldauf et al., 2000Page 698

Fungal genome projects

There are >250 fungal genome projects (17 complete, 127 in assembly, 127 in progress).

Most of these are Ascomycetes (193)60 basidiomycetes18 “other” including:

Antonospora locustae 2.9 MbBatrachochytrium dendrobatidis 20 Mb 20 chrom. Encephalitozoon cuniculi 2.5 Mb 11 chrom.Rhizopus oryzae 40 Mb

updated Nov. 2010

Fungal genome projects

Ascomycetes include:

Ajellomyces capsulatus (four strains)Aspergillus (7 species including Aspergillus nidulans)Botryotinia fuckeliana (2 strains)Candida (3 strains of C. albicans; total of 6 species)Coccidioides immitis (15 Coccidioides genomes)Kluyveromyces lactis (4 Kluyveromyces genomes)Neurospora crassa Pichia (5 genomes)Saccharomyces (14 genomes including S. cerevisiae)Schizosaccharomyces pombe (3 schizosacch. genomes)Yarrowia lipolytica

updated 11/09

Fungal genome projects

Basidiomycetes; size in megabases (Mb), # chromosomes ()

Coprinopsis cinerea okayama 37.5 Mb (13)Cryptococcus neoformans 20 Mb (14) Cryptococcus neoformans 18 Mb (14) Cryptococcus neoformans var. grubii H99 20 Mb (14) Cryptococcus neoformans var. neoformans B-3501A18.52 Mb (14) Cryptococcus neoformans var. neoformans JEC21 19.05 (14)Lentinula edodes L-54 8 MbPhakopsora meibomiae Phakopsora pachyrhizi 50 MbPhanerochaete chrysosporium RP-78 30 Mb (10)Ustilago maydis 521 20 Mb (23)

updated 11/09

Fungal Genomes Central at NCBIhttp://www.ncbi.nlm.nih.gov/genome/guide/fungi/

Fungal pathogen: Aspergillus nidulans

--Of 185 Aspergillus species, 20 are human pathogens

--A. nidulans has a sexual life cycle (in contrast to A. fumigatus and A. oryzae [sake, miso, soy]).

--A. nidulans has animal-like peroxisomal enzymes

Page 715

Use TaxPlot to identify evolving Aspergillus proteins

Fungal pathogen: Candida albicans

--Diploid sexually reproducing fungus--Causes opportunistic infections in humans--Genome: 14.8 Mb with 8 chromosome pairs. Seven of these are constant, and the 8th varies from 3 to 4 Mb.--No known haploid state; the heterozygous diploid state was sequenced.--Over 7600 open reading frames--CUG is translated as serine (rather than leucine)

Page 718

An atypical fungus: Encephalitozoon cuniculi

Microsporidia are single-celled eukaryotes that lackmitochondria and peroxisomes. Consistent with theirroles as parasites, the E. cuniculi genome is severelyreduced in size (2000 proteins, only 2.9 Mb). They were thought to represent deep-branching protozoans, butrecent phylogenetic studies place them as an outgroupto fungi.

Page 719

Fig. 17.15Page 720

Encephalitozoon cuniculi as a fungal outgroup

Orange bread mold: Neurospora crassa

Beadle and Tatum chose N. crassa as a model organismto study gene-protein relationships. The genome sequencewas reported: 39 Mb, 7 chromosomes, 10,082 ORFs(Galagan et al., 2003).

N. crassa has only 10% repetitive DNA, and incredibly, only 8 pairs of duplicated genes that encode proteins >100 amino acids. This is because Neurospora uses“repeat-induced point mutation” (RIP), a mechanism bywhich the genome is scanned for duplicated (repeated)sequences. This appears to serve as a genomic defensesystem, inactivating potentially harmful transposons.

Page 719

Schizosaccharomyces pombe

The S. pombe genome is 13.8 Mb and encodes ~4900predicted proteins. Some bacterial genomes encode more proteins (e.g. Mesorhizobium loti with 6752, and Streptomyces coelicolor with 7825 genes).

Chromosome genes Coding1 5.6 Mb 2,255 59%2 4.4 Mb 1,790 58%3 2.5 Mb 884 55%

Total 12.5 Mb 4,929 58%

See: EBI www.sanger.ac.uk/Projects/S_pombe

Page 721

Schizosaccharomyces pombe

Chromosome genes Coding1 5.6 Mb 2,255 59%2 4.4 Mb 1,790 58%3 2.5 Mb 884 55%

Total 12.5 Mb 4,929 58%

See: EBI www.sanger.ac.uk/Projects/S_pombe

Schizosaccharomyces pombe

S. pombe diverged from S. cerevisiae about330 to 420 million years ago.

Many genes are as divergent between thesetwo fungi as they are diverged from humans.

To see this, try TaxPlot at NCBI.

Page 721

Perspective and pitfalls

The budding yeast S. cerevisiae is one of the most significantorganisms in biology:• Its genome is the first of a eukaryote to be sequenced• Its biology is simple relative to metazoans• Through yeast genetics, powerful functional genomics approaches have been applied to study all yeast genes

It is important to note that even for yeast, our knowledge of basic biological questions is highly incomplete. We still understand little about how the genotype of anorganism leads to its characteristic phenotype.

Page 721

Summary of key points

[1] S. cerevisiae has 16 chromosomes and ~6,000 genes

[2] Its genome underwent a whole genome duplication followed by massive gene loss

[3] Comparative genomics is a powerful approach--to identify genes--to identify regulatory regions--to infer evolutionary history of genome

duplications and gene gain or loss

[4] SGD (Saccharomyces Genome Database) is the major web resource for yeast