46
The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner [email protected]

The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner [email protected]

Embed Size (px)

Citation preview

Page 1: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

The Human Genome(part 1 of 2)

Wednesday, November 5, 2003

Introduction to BioinformaticsME:440.714J. Pevsner

[email protected]

Page 2: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Many of the images in this powerpoint presentationare from Bioinformatics and Functional Genomicsby J Pevsner (ISBN 0-471-21004-8). Copyright © 2003 by Wiley.

These images and materials may not be usedwithout permission from the publisher.

Visit http://www.bioinfbook.org

Copyright notice

Page 3: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Today: Human genome

Friday Nov. 7: computer lab

Monday Nov. 10: Human disease (West Lecture Hall)

Wednesday Nov. 12: Final exam (in class);find-a-gene project due

Announcements

Page 4: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Final exam on November 12

Format: -- closed book-- one hour, in-class (ok to take longer)-- to practice, do the self-test quizzes at the ends of chapters 15-18.

Some of the questions will be based on the recentarticle on human chromosome 6:

Mungall AJ et al., The DNA sequence and analysisof human chromosome 6. Nature 425, 805-811,23 October 2003.

See also the accompanying News & Views:Grimwood J and Schmutz J, Six is seventh, Nature 425,775-776, 23 October 2003.

Page 5: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Outline of today’s lecture

1. Summary of major findings of Human Genome Project

2. Web resources for the human genome

3. We will follow the outline of the February 2001 Nature paper describing the human genome.

Page 607

Page 6: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Main conclusions of thehuman genome project

Page 608

Page 7: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Main web sites for the human genome

Genome Hub National Human Genome Research Institute (NHGRI)http://www.genome.gov/

NCBI Genome Centralwww.ncbi.nlm.nih.gov/genome/central

Ensemblhttp://www.ensembl.org/genome/central/

Page 608

Page 8: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

1. There are about 30,000 to 40,000 human genes.This number is far smaller than earlierestimates.

Page 608

Main conclusions of human genome project

Page 9: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

1. There are about 30,000 to 40,000 human genes.This number is far smaller than earlierestimates.

The public consortium estimated 31,000, whileCelera estimated 38,500.

But note:• Many predicted genes are unique to each group• There are many transcripts of unknown function• Current estimates (2003) are ~30,000 genes.

Page 608

Main conclusions of human genome project

Page 10: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Page 608

Main conclusions of human genome project

1. We have about the same number of genes asfish and plants, and not that many more genesthan worms and flies.

Page 11: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

1. We have about the same number of genes asfish and plants, and not that many more genesthan worms and flies.

Fugu rubripes (pufferfish): 31,000 to 38,000Arabidopsis thaliana (thale cress): 26,000Caenorhabditis elegans (worm): 19,000Drosophila melanogaster (fly): 13,000

Page 608

Main conclusions of human genome project

Page 12: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

2. The human proteome is far more complex than the set of proteins encoded by invertebrate genomes.

Page 608

Main conclusions of human genome project

Page 13: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

2. The human proteome is far more complex than the set of proteins encoded by invertebrate genomes.

Vertebrates have a more complex mixture of protein domain architectures. Additionally, the human genome displays greater complexity in its processing of mRNA transcripts by alternative splicing.

Page 608

Main conclusions of human genome project

Page 14: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Page 608

Main conclusions of human genome project

3. Hundreds of human genes were acquired from bacteria by lateral gene transfer, according to the initial report.

Page 15: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

3. Hundreds of human genes were acquired from bacteria by lateral gene transfer, according to the initial report.

Evidence: compare the proteomes of human, fly, worm, yeast, Arabidopsis, eukaryotic parasites, and all completedprokaryotic genomes. Find some genes shared exclusivelyby humans and bacteria—but according to TIGR, only about 40 of these genes (or fewer?) were acquired by LGT.(See Salzberg et al., Science 292:1903, 2001).

Reasons for artifactually high estimates include: -- gene loss-- small sample size of species

Page 608

Main conclusions of human genome project

Page 16: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

4. 98% of the genome does not code for genes

Page 608

Main conclusions of human genome project

Page 17: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

4. 98% of the genome does not code for genes

>50% of the genome consists of repetitive DNAderived from transposable elements (also called interspersed repeats):

LINEs (20%)SINEs (13%)LTR retrotransposons (8%)DNA transposons (3%)

Page 608

Main conclusions of human genome project

Page 18: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

4. 98% of the genome does not code for genes

>50% of the genome consists of repetitive DNAderived from transposable elements:

LINEs (20%)SINEs (13%)LTR retrotransposons (8%)DNA transposons (3%)

There has been a decline in activity of some of these elements in the human lineage.

Page 608

Main conclusions of human genome project

Page 19: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

5. Segmental duplication is a frequent occurrence in the human genome.

-- tandem duplications (rare)-- retrotransposition (intronless paralogs)-- segmental duplications (common)

Page 608

Main conclusions of human genome project

Page 20: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

6. There are 300,000 Alu repeats in the human genome.

These are about 300 base pairs and contain an AluIrestriction enzyme site. They occupy 3% of the genome.We saw an example of an Alu repeat in Chapter 16.

Their distribution is non-random: they are retained inGC-rich regions and may confer some benefit.

Page 608

Main conclusions of human genome project

Page 21: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

7. The mutation rate is about twice as high in male meiosis than female meiosis. Most mutation probably occurs in males.

Page 609

Main conclusions of human genome project

Page 22: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

8. More than 1.4 million single nucleotide polymorphisms (SNPs; single base pair changes) were identified.

Celera initially identified 2.1 million SNPs.

Currently, dbSNP at NCBI (build 118) has about 5.8 million human SNPs (2.4 million validated). A SNP occurs every 100 to 300 base pairs.

A random pair of haploid genomes differs at a rate of 1 base pair every 1250, on average (Celera). Fewer than 1% of SNPs alter protein sequence.

Page 609

Main conclusions of human genome project

Page 23: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Three gateways to access the human genome

Page 608

Page 24: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Three gateways to access the human genome

NCBI map viewerwww.ncbi.nlm.nih.gov

Ensembl Project (EBI/Sanger Institute)www.ensembl.org

UCSC (Golden Path)www.genome.ucsc.edu

Page 609

Page 25: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Three gateways to access the human genome

NCBI map viewerwww.ncbi.nlm.nih.gov

Ensembl Project (EBI/Sanger Institute)www.ensembl.org

UCSC (Golden Path)www.genome.ucsc.edu

Each of these three sites providesessential resources to study the

human genome (and other genomes)

Page 26: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Fig. 17.1Page 610

NCBI offers a human map viewer

Page 27: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Fig. 17.2Page 611

Map viewer: RBP4 on chromosome 10

Click tocustomizethe trackson thismap

Page 28: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu
Page 29: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

LocusLink

DNA (contig)

OMIM

Sequence viewer

protein

evidence viewer

Model maker

HomoloGene

Confirmed gene model

orientation

Page 30: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Fig. 17.3Page 613

NCBI’s evidence viewerprovides data on gene models (e.g. mapping ESTs to genomic DNA)

Page 31: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Fig. 17.3Page 613

NCBI evidence viewer: gene structures

Page 32: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Fig. 17.3Page 613

NCBI evidence viewer: gene structures

Evidence for a discrepancy(e.g. sequencing error or

polymorphism)

Page 33: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

The Ensembl project currently includes genome browsers for nine organisms:

Human mouse zebrafishFugu mosquito fruitflyC. elegans C. briggsae rat

Visit http://www.ensembl.org

Ensembl

Page 610

Page 34: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Fig. 17.4Page 614

Ensembl human genome browser

Page 35: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Fig. 17.5Page 615

Ensembl: GeneView for RBP4

Page 36: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Fig. 17.6Page 616

Ensembl: GeneView for RBP4

Page 37: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Fig. 17.7Page 617

Ensembl human genome browser: ContigView

Page 38: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Fig. 17.7Page 617

Ensembl human genome browser: ContigView

Page 39: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Fig. 17.8Page 618

Ensembl human genome browser: TransView

Page 40: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Fig. 17.9Page 619

Ensembl: ProteinView for RBP4

Page 41: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Fig. 17.10Page 620

Ensembl: MapView for chromosome 10

Page 42: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Fig. 17.11Page 621

Ensembl: SyntenyView for chromosome 10

Page 43: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

The University of California at Santa Cruz (UCSC) offersa genome browser with the “golden path” annotation of thehuman genome.

The browser features searches by keyword, gene name,or other text searches. UCSC offers the lightning fastBLAT BLAST-like tool (see Chapter 5).

A key feature of this browser is its customizable annotationtracks. About half of these tracks are offered by users of the site throughout the world.

Visit http://genome.ucsc.edu

The UCSC human genome browser

Page 614

Page 44: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Fig. 17.12Page 622

Page 45: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Fig. 17.13Page 623

Page 46: The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

This lecture continueswith part 2 of 2…