Upload
godwin-smith
View
224
Download
2
Embed Size (px)
Citation preview
The Human Genome(part 1 of 2)
Wednesday, November 5, 2003
Introduction to BioinformaticsME:440.714J. Pevsner
Many of the images in this powerpoint presentationare from Bioinformatics and Functional Genomicsby J Pevsner (ISBN 0-471-21004-8). Copyright © 2003 by Wiley.
These images and materials may not be usedwithout permission from the publisher.
Visit http://www.bioinfbook.org
Copyright notice
Today: Human genome
Friday Nov. 7: computer lab
Monday Nov. 10: Human disease (West Lecture Hall)
Wednesday Nov. 12: Final exam (in class);find-a-gene project due
Announcements
Final exam on November 12
Format: -- closed book-- one hour, in-class (ok to take longer)-- to practice, do the self-test quizzes at the ends of chapters 15-18.
Some of the questions will be based on the recentarticle on human chromosome 6:
Mungall AJ et al., The DNA sequence and analysisof human chromosome 6. Nature 425, 805-811,23 October 2003.
See also the accompanying News & Views:Grimwood J and Schmutz J, Six is seventh, Nature 425,775-776, 23 October 2003.
Outline of today’s lecture
1. Summary of major findings of Human Genome Project
2. Web resources for the human genome
3. We will follow the outline of the February 2001 Nature paper describing the human genome.
Page 607
Main conclusions of thehuman genome project
Page 608
Main web sites for the human genome
Genome Hub National Human Genome Research Institute (NHGRI)http://www.genome.gov/
NCBI Genome Centralwww.ncbi.nlm.nih.gov/genome/central
Ensemblhttp://www.ensembl.org/genome/central/
Page 608
1. There are about 30,000 to 40,000 human genes.This number is far smaller than earlierestimates.
Page 608
Main conclusions of human genome project
1. There are about 30,000 to 40,000 human genes.This number is far smaller than earlierestimates.
The public consortium estimated 31,000, whileCelera estimated 38,500.
But note:• Many predicted genes are unique to each group• There are many transcripts of unknown function• Current estimates (2003) are ~30,000 genes.
Page 608
Main conclusions of human genome project
Page 608
Main conclusions of human genome project
1. We have about the same number of genes asfish and plants, and not that many more genesthan worms and flies.
1. We have about the same number of genes asfish and plants, and not that many more genesthan worms and flies.
Fugu rubripes (pufferfish): 31,000 to 38,000Arabidopsis thaliana (thale cress): 26,000Caenorhabditis elegans (worm): 19,000Drosophila melanogaster (fly): 13,000
Page 608
Main conclusions of human genome project
2. The human proteome is far more complex than the set of proteins encoded by invertebrate genomes.
Page 608
Main conclusions of human genome project
2. The human proteome is far more complex than the set of proteins encoded by invertebrate genomes.
Vertebrates have a more complex mixture of protein domain architectures. Additionally, the human genome displays greater complexity in its processing of mRNA transcripts by alternative splicing.
Page 608
Main conclusions of human genome project
Page 608
Main conclusions of human genome project
3. Hundreds of human genes were acquired from bacteria by lateral gene transfer, according to the initial report.
3. Hundreds of human genes were acquired from bacteria by lateral gene transfer, according to the initial report.
Evidence: compare the proteomes of human, fly, worm, yeast, Arabidopsis, eukaryotic parasites, and all completedprokaryotic genomes. Find some genes shared exclusivelyby humans and bacteria—but according to TIGR, only about 40 of these genes (or fewer?) were acquired by LGT.(See Salzberg et al., Science 292:1903, 2001).
Reasons for artifactually high estimates include: -- gene loss-- small sample size of species
Page 608
Main conclusions of human genome project
4. 98% of the genome does not code for genes
Page 608
Main conclusions of human genome project
4. 98% of the genome does not code for genes
>50% of the genome consists of repetitive DNAderived from transposable elements (also called interspersed repeats):
LINEs (20%)SINEs (13%)LTR retrotransposons (8%)DNA transposons (3%)
Page 608
Main conclusions of human genome project
4. 98% of the genome does not code for genes
>50% of the genome consists of repetitive DNAderived from transposable elements:
LINEs (20%)SINEs (13%)LTR retrotransposons (8%)DNA transposons (3%)
There has been a decline in activity of some of these elements in the human lineage.
Page 608
Main conclusions of human genome project
5. Segmental duplication is a frequent occurrence in the human genome.
-- tandem duplications (rare)-- retrotransposition (intronless paralogs)-- segmental duplications (common)
Page 608
Main conclusions of human genome project
6. There are 300,000 Alu repeats in the human genome.
These are about 300 base pairs and contain an AluIrestriction enzyme site. They occupy 3% of the genome.We saw an example of an Alu repeat in Chapter 16.
Their distribution is non-random: they are retained inGC-rich regions and may confer some benefit.
Page 608
Main conclusions of human genome project
7. The mutation rate is about twice as high in male meiosis than female meiosis. Most mutation probably occurs in males.
Page 609
Main conclusions of human genome project
8. More than 1.4 million single nucleotide polymorphisms (SNPs; single base pair changes) were identified.
Celera initially identified 2.1 million SNPs.
Currently, dbSNP at NCBI (build 118) has about 5.8 million human SNPs (2.4 million validated). A SNP occurs every 100 to 300 base pairs.
A random pair of haploid genomes differs at a rate of 1 base pair every 1250, on average (Celera). Fewer than 1% of SNPs alter protein sequence.
Page 609
Main conclusions of human genome project
Three gateways to access the human genome
Page 608
Three gateways to access the human genome
NCBI map viewerwww.ncbi.nlm.nih.gov
Ensembl Project (EBI/Sanger Institute)www.ensembl.org
UCSC (Golden Path)www.genome.ucsc.edu
Page 609
Three gateways to access the human genome
NCBI map viewerwww.ncbi.nlm.nih.gov
Ensembl Project (EBI/Sanger Institute)www.ensembl.org
UCSC (Golden Path)www.genome.ucsc.edu
Each of these three sites providesessential resources to study the
human genome (and other genomes)
Fig. 17.1Page 610
NCBI offers a human map viewer
Fig. 17.2Page 611
Map viewer: RBP4 on chromosome 10
Click tocustomizethe trackson thismap
LocusLink
DNA (contig)
OMIM
Sequence viewer
protein
evidence viewer
Model maker
HomoloGene
Confirmed gene model
orientation
Fig. 17.3Page 613
NCBI’s evidence viewerprovides data on gene models (e.g. mapping ESTs to genomic DNA)
Fig. 17.3Page 613
NCBI evidence viewer: gene structures
Fig. 17.3Page 613
NCBI evidence viewer: gene structures
Evidence for a discrepancy(e.g. sequencing error or
polymorphism)
The Ensembl project currently includes genome browsers for nine organisms:
Human mouse zebrafishFugu mosquito fruitflyC. elegans C. briggsae rat
Visit http://www.ensembl.org
Ensembl
Page 610
Fig. 17.4Page 614
Ensembl human genome browser
Fig. 17.5Page 615
Ensembl: GeneView for RBP4
Fig. 17.6Page 616
Ensembl: GeneView for RBP4
Fig. 17.7Page 617
Ensembl human genome browser: ContigView
Fig. 17.7Page 617
Ensembl human genome browser: ContigView
Fig. 17.8Page 618
Ensembl human genome browser: TransView
Fig. 17.9Page 619
Ensembl: ProteinView for RBP4
Fig. 17.10Page 620
Ensembl: MapView for chromosome 10
Fig. 17.11Page 621
Ensembl: SyntenyView for chromosome 10
The University of California at Santa Cruz (UCSC) offersa genome browser with the “golden path” annotation of thehuman genome.
The browser features searches by keyword, gene name,or other text searches. UCSC offers the lightning fastBLAT BLAST-like tool (see Chapter 5).
A key feature of this browser is its customizable annotationtracks. About half of these tracks are offered by users of the site throughout the world.
Visit http://genome.ucsc.edu
The UCSC human genome browser
Page 614
Fig. 17.12Page 622
Fig. 17.13Page 623
This lecture continueswith part 2 of 2…