27
The Sequences of Complete Genomes Genes that are common to C. elegans and yeast may function in basic cellular processes such as metabolism, DNA replication, transcription, translation, and protein sorting. It is likely that these genes will be shared by all eukaryotic cells. But most genes in C. elegans are not found in yeast, and may function in the regulatory activities required for development of multicellular organisms. Many C. elegans genes involved in development and differentiation are related to those in mammalian cells, validating use of C. elegans as a model for more complex animals.

The Sequences of Complete Genomes - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/2.pdf · The Sequences of Complete Genomes The Drosophila genome, sequenced in 2000,

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The Sequences of Complete Genomes - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/2.pdf · The Sequences of Complete Genomes The Drosophila genome, sequenced in 2000,

The Sequences of Complete Genomes

Genes that are common to C. elegans and yeast may function in basic cellular processes such as metabolism, DNA replication, transcription, translation, and protein sorting.

It is likely that these genes will be shared by all eukaryotic cells.

But most genes in C. elegans are not found in yeast, and may function in the regulatory activities required for development of multicellular organisms.

Many C. elegans genes involved in development and differentiation are related to those in mammalian cells, validating use of C. elegans as a model for more complex animals.

Page 2: The Sequences of Complete Genomes - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/2.pdf · The Sequences of Complete Genomes The Drosophila genome, sequenced in 2000,

The Sequences of Complete Genomes

Drosophila has giant polytene chromosomes in some tissues.

They arise in nondividing cells after repeated replication of DNA strands that fail to separate.

Each one contains hundreds of identical DNA molecules aligned in parallel.

Polytene chromosomes have distinct banding patterns visible in the light microscope, which provides a physical map of the genome.

Cloned DNAs can be mapped by in situ hybridization.

Page 3: The Sequences of Complete Genomes - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/2.pdf · The Sequences of Complete Genomes The Drosophila genome, sequenced in 2000,

Figure 5.27 In situ hybridization to a Drosophila polytene chromosome

Page 4: The Sequences of Complete Genomes - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/2.pdf · The Sequences of Complete Genomes The Drosophila genome, sequenced in 2000,

The Sequences of Complete Genomes

The Drosophila genome, sequenced in 2000, is about 180 ×106 base pairs (14,000 genes). About one-third is heterochromatin.

The euchromatin was sequenced using bacterial artificial chromosome (BAC) clones, and a shotgun approach in which small fragments of DNA were randomly cloned and sequenced in plasmid vectors.

Protein-coding sequences account for about 13% of the Drosophila genome.

Drosophila has far fewer genes than C. elegans, even though Drosophila is a more complex organism.

Increased complexity may arise from larger proteins that contain more functional domains.

Page 5: The Sequences of Complete Genomes - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/2.pdf · The Sequences of Complete Genomes The Drosophila genome, sequenced in 2000,

The Sequences of Complete Genomes

The genome of Arabidopsis thaliana was sequenced in 2000 (125 × 106 base pairs; 26,000 genes).

Sequencing used mostly BAC vectors to accommodate large DNA inserts.

The large number of genes is partly due to duplications; there are about 16,000 distinct protein-coding genes.

Comparative analysis has revealed similarities and differences between genes of plants and animals.

Arabidopsis genes for fundamental cellular processes are similar to those in yeast, C. elegans, and Drosophila, reflecting the common evolutionary origins of all eukaryotic cells.

Page 6: The Sequences of Complete Genomes - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/2.pdf · The Sequences of Complete Genomes The Drosophila genome, sequenced in 2000,

The Sequences of Complete GenomesGenes encoding proteins involved in processes such as cell signaling and

membrane transport are quite different.

About 1/3 of all Arabidopsis genes appear unique to plants, including genes involved in photosynthesis and plant defense.

Page 7: The Sequences of Complete Genomes - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/2.pdf · The Sequences of Complete Genomes The Drosophila genome, sequenced in 2000,

The Sequences of Complete Genomes

The complete rice genome sequence was reported in 2005 (390 × 106 base pairs; about 41,000 genes).

The black cottonwood tree, has over 45,000 genes.

Plants appear to have far more genes than animals; many are duplications.

Page 8: The Sequences of Complete Genomes - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/2.pdf · The Sequences of Complete Genomes The Drosophila genome, sequenced in 2000,

The Sequences of Complete GenomesThe human genome has about 3 × 109 base pairs.

It is distributed among 24 chromosomes, each containing between 45 and 280 Mb of DNA.

Page 9: The Sequences of Complete Genomes - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/2.pdf · The Sequences of Complete Genomes The Drosophila genome, sequenced in 2000,

Figure 5.29 The human chromosomes (Part 2)

Page 10: The Sequences of Complete Genomes - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/2.pdf · The Sequences of Complete Genomes The Drosophila genome, sequenced in 2000,

The Sequences of Complete GenomesSeveral thousand human genes had already been identified and mapped by

methods such as in situ hybridization of probes labeled with fluorescent dyes—fluorescence in situ hybridization, or FISH.

Page 11: The Sequences of Complete Genomes - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/2.pdf · The Sequences of Complete Genomes The Drosophila genome, sequenced in 2000,

The Sequences of Complete Genomes

Genetic linkage analysis and physical mapping of cloned genomic and cDNA sequences established maps of the human genome, which provided a background for genomic sequencing.

Draft sequences of the human genome published in 2001 were produced by two independent teams of researchers, using different approaches.

The International Human Genome Sequencing Consortium used BAC clones.

Page 12: The Sequences of Complete Genomes - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/2.pdf · The Sequences of Complete Genomes The Drosophila genome, sequenced in 2000,

Figure 5.31 Sequence of human chromosome 1

Page 13: The Sequences of Complete Genomes - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/2.pdf · The Sequences of Complete Genomes The Drosophila genome, sequenced in 2000,

The Sequences of Complete GenomesA team led by Craig Venter of Celera Genomics used a shotgun approach:

Small fragments were cloned and sequenced, overlaps between sequences were then used to assemble the sequence of the genome.

A high-quality human genome sequence was published in 2004.

Page 14: The Sequences of Complete Genomes - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/2.pdf · The Sequences of Complete Genomes The Drosophila genome, sequenced in 2000,

The Sequences of Complete Genomes

A major surprise from the human genome sequence was the unexpectedly low number of genes: only 20,000 to 25,000.

But alternative splicing in human genes allows a single gene to specify more than one protein.

Human genes are spread over much larger distances and contain more intron sequences than genes in Drosophila or C. elegans.

About 90% of an average human gene consists of introns.

Only about 1.2% of the human genome corresponds to protein-coding sequences.

Page 15: The Sequences of Complete Genomes - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/2.pdf · The Sequences of Complete Genomes The Drosophila genome, sequenced in 2000,

The Sequences of Complete Genomes

Genes for basic cellular processes are conserved.

Most proteins that are unique to humans are made up of domains that are also found in other organisms, but are arranged in novel combinations.

The genomes of many other vertebrates have now been sequenced.

The pufferfish genome is unusually compact. It has far less repetitive sequence and smaller introns than the human genome.

It provides a model of a vertebrate genome in which genes and regulatory sequences are concentrated.

Page 16: The Sequences of Complete Genomes - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/2.pdf · The Sequences of Complete Genomes The Drosophila genome, sequenced in 2000,

Figure 5.32 Evolution of sequenced vertebrates

Page 17: The Sequences of Complete Genomes - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/2.pdf · The Sequences of Complete Genomes The Drosophila genome, sequenced in 2000,

The Sequences of Complete GenomesThe mouse is the key model system for experimental studies of mammalian

genetics and development.

Availability of the mouse genome sequence provides an essential database for research in these areas.

Analyses of dog genomes have identified genes for white coat color and body size of small breeds.

Some types of cancer are common in certain breeds of dogs as well as humans. Understanding the genetic basis of these diseases can impact both human health and veterinary medicine.

Page 18: The Sequences of Complete Genomes - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/2.pdf · The Sequences of Complete Genomes The Drosophila genome, sequenced in 2000,

The Sequences of Complete Genomes

The chimpanzee genome is expected to help pinpoint unique features that distinguish humans from other primates.

The nucleotide sequences of the chimpanzee and human genomes are nearly 99% identical.

But, the sequence differences between humans and chimpanzees frequently alter the coding sequences of genes, leading to different amino acid sequences of most of the proteins encoded by chimpanzees and humans.

Page 19: The Sequences of Complete Genomes - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/2.pdf · The Sequences of Complete Genomes The Drosophila genome, sequenced in 2000,

Bioinformatics and Systems Biology

Genome sequencing has introduced new large-scale experimental approaches that generate vast amounts of data.

The new field of bioinformatics, at the interface between biology and computer science, is focused on computational methods to analyze and extract biological information from all this data.

Large-scale experimental approaches form the basis of systems biology, which seeks a quantitative understanding of the integrated dynamic behavior of complex biological systems and processes.

Proteomics—the global analysis of cell proteins, is one example.

Page 20: The Sequences of Complete Genomes - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/2.pdf · The Sequences of Complete Genomes The Drosophila genome, sequenced in 2000,

Bioinformatics and Systems BiologyOne approach to the study of gene function is to inactivate (knockout) each

gene.

This has produced a collection of yeast strains with mutations in all known genes, and efforts to do the same with mice are underway.

Other large-scale screening projects are based on RNA interference (RNAi).

With the availability of complete genome sequences, libraries of double-stranded RNAs can be designed and used in genome-wide screens to identify all of the genes involved in any biological process.

Page 21: The Sequences of Complete Genomes - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/2.pdf · The Sequences of Complete Genomes The Drosophila genome, sequenced in 2000,

Figure 5.33 Genome-wide RNAi screen for cell growth and viability

Page 22: The Sequences of Complete Genomes - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/2.pdf · The Sequences of Complete Genomes The Drosophila genome, sequenced in 2000,

Bioinformatics and Systems Biology

Understanding the mechanisms that control gene expression is a central undertaking in cell and molecular biology.

It is far more difficult to identify gene regulatory sequences than protein-coding sequences.

Most regulatory elements are short sequences, typically only about ten base pairs.

Consequently, sequences resembling regulatory elements occur frequently by chance in genomic DNA.

Page 23: The Sequences of Complete Genomes - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/2.pdf · The Sequences of Complete Genomes The Drosophila genome, sequenced in 2000,

Bioinformatics and Systems Biology

The expression of all genes in a cell can be assayed simultaneously using DNA microarrays.

This approach has revealed global changes in gene regulation associated with discrete cell behaviors, such as cell differentiation.

Analyzing changes in expression of multiple genes can help to pinpoint shared regulatory elements.

Computational approaches include comparative analysis of genome sequences of related organisms.

This assumes that functionally important sequences are conserved in evolution, and nonfunctional segments diverge more rapidly.

Page 24: The Sequences of Complete Genomes - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/2.pdf · The Sequences of Complete Genomes The Drosophila genome, sequenced in 2000,

Bioinformatics and Systems BiologyComputational analysis to identify noncoding sequences that are

conserved between the mouse, rat, dog, and human genomes has helped delineate sequences that control gene transcription.

Page 25: The Sequences of Complete Genomes - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/2.pdf · The Sequences of Complete Genomes The Drosophila genome, sequenced in 2000,

Bioinformatics and Systems Biology

The genome sequences of individuals can also be compared.

One application of the human genome sequence will be helping to uncover new genes involved in many diseases.

Understanding the unique genetic makeup of individuals may lead to development of tailor-made strategies for disease prevention and treatment.

Personal genome sequencing may become part of the future of medical practice, with continuing improvements in technology at reasonable cost.

FOR WHAT? – personalized medicine

Page 26: The Sequences of Complete Genomes - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/2.pdf · The Sequences of Complete Genomes The Drosophila genome, sequenced in 2000,

Bioinformatics and Systems Biology

The genomes of two unrelated people differ in about 1 in every 1000 bases, mostly in the form of single base changes, or single nucleotide polymorphisms (SNPs)

Over a million commonly occurring SNPs have been mapped in the human genome.

Genome-wide association scans (GWAS) have used SNPs to identify genes associated with inherited differences in susceptibility to several common diseases.

The DNAs of thousands of patients and normal controls are hybridized to microarrays containing both alleles of up to 500,000 common SNPs.

Page 27: The Sequences of Complete Genomes - KOCWcontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/2.pdf · The Sequences of Complete Genomes The Drosophila genome, sequenced in 2000,

Figure 5.35 Genome-wide association scan (Part 1)