38
Chapter 24 topics: Genomics, Proteomics, Bioinformatics Student learning outcomes: Describe tools to obtain DNA sequences of genomes Explain how microarrays analyze the transcriptome Describe how proteomics studies proteins of cells Define how bioinformatics manages vast stores of DNA data Figures: 1, 3-13, 16, 17, 19, 20, 23, 24, 27, 28, 30; Tables 1, 2, 3 Problems: 1, 2*, 3-7, 9,12*, 15, 17,18, 20*, 22, 23*, 24, AQ3*,4 24-1

Chapter 24 topics: Genomics, Proteomics, Bioinformatics

  • Upload
    ankti

  • View
    113

  • Download
    4

Embed Size (px)

DESCRIPTION

Chapter 24 topics: Genomics, Proteomics, Bioinformatics. Student learning outcomes: Describe tools to obtain DNA sequences of genomes Explain how microarrays analyze the transcriptome Describe how proteomics studies proteins of cells - PowerPoint PPT Presentation

Citation preview

Page 1: Chapter 24 topics: Genomics, Proteomics, Bioinformatics

Chapter 24 topics:Genomics, Proteomics, Bioinformatics

Student learning outcomes:• Describe tools to obtain DNA sequences of genomes• Explain how microarrays analyze the transcriptome• Describe how proteomics studies proteins of cells• Define how bioinformatics manages vast stores of

DNA data

Figures: 1, 3-13, 16, 17, 19, 20, 23, 24, 27, 28, 30; Tables 1, 2, 3

Problems: 1, 2*, 3-7, 9,12*, 15, 17,18, 20*, 22, 23*, 24, AQ3*,424-1

Page 2: Chapter 24 topics: Genomics, Proteomics, Bioinformatics

24-2

24.1 Positional CloningPositional cloning: discover genes for genetic traits• Mapping studies to roughly locate gene of interest

to relatively small region of DNA on chromosome• Physical landmarks - relate to gene position:

• Restriction Fragment Length Polymorphisms (RFLP): lengths of restriction fragments from a specific enzyme vary among individuals

• CpG Islands: DNA with unmethylated CpG is often actively expressed; find with methylation-sensitive restriction enzymes (HpaII vs. MspI for CCGG)

Page 3: Chapter 24 topics: Genomics, Proteomics, Bioinformatics

24-3

Southern blots detect RFLPs

Fig. 1 People differ in presence of particular HindIII site

Page 4: Chapter 24 topics: Genomics, Proteomics, Bioinformatics

24-4

Classic example: Identifying Gene Mutated in Human Huntington’s Disease (HD)

• Dominant disease, late onset, degenerative• Used RFLPs with huge family groups having disesase,

Wexler, Gusella to map HD gene near end of chromosome 4

• Mutation causing disease is expansion of CAG repeat from normal range of 11-34 copies to abnormal range of > 38 copies (triplet expansion)

• Extra repeats -> extra Gln inserted into huntingtin, product of HD gene

• Huntingtin has normal role in brain: interferes with transcription factor SP1 binding TAF130

• Mouse knockout: heterozygotes have neuro problems; null are dead

Page 5: Chapter 24 topics: Genomics, Proteomics, Bioinformatics

RFLPs helped locate Huntington’s disease gene

24-5

Fig. 3 Combinations of RFLP distinguish 4 possible haplotypes

Fig. 4 Southern blot defines haplotype genotypes of members

Page 6: Chapter 24 topics: Genomics, Proteomics, Bioinformatics

Fig. 24.5 Haplotype C is associated with disease - predictive

HD gene identified from studies large families

Pedigree studies, molecular studies of haplotypes, and correlation with disease: lead to cloning of gene and prediction for disease (variable age onset)

Page 7: Chapter 24 topics: Genomics, Proteomics, Bioinformatics

24-7

24.2 Sequencing Genomes• Information from genome sequences:

– Location of exact coding regions for all genes– Spatial relationships among genes, exact distances

between them in bp– Sanger dideoxy sequencing 1977 (X174 phage)

• How is coding region recognized?– Contains an ORF long enough to code for protein– ORF (open reading frame) must

• Start with ATG triplet• End with stop codon

– Phage or bacterial ORF same as coding region– Eukaryotic ORF definition is more difficult: introns

Page 8: Chapter 24 topics: Genomics, Proteomics, Bioinformatics

24-8

Genome Results (Table 1 examples)

Numerous RNA or DNA sequences of genomes of viruses and organisms have been obtained:– Phages, viruses– Bacteria – Animals– Plants– Human, Neanderthal

Comparison of related genomes (close or distant) sheds light on evolution of species: phylogeny from combination of traditional and molecular data

Page 9: Chapter 24 topics: Genomics, Proteomics, Bioinformatics

24-9

*

*

*

*

Page 10: Chapter 24 topics: Genomics, Proteomics, Bioinformatics

24-10

Human Genome Project (3 x 109 bp haploid) A. Original plan systematic and conservative: (1990)

– Funded by NIH, Dept. of Energy– Prepare genetic, physical maps with markers: then piece

DNA sequences together in proper order– Plan most sequencing after mapping complete– [Also many model organisms sequenced to compare]

• Celera, a private, for-profit company (J.C. Venter) vowed to complete rough draft of genome by 2000

B. Celera method was shotgun sequencing:– Whole genome chopped up and cloned– Clones sequenced randomly– Sequences pieced together by computer programs

Page 11: Chapter 24 topics: Genomics, Proteomics, Bioinformatics

24-11

Vectors for Large-Scale Genome Projects

• Two high-capacity vectors for Human Genome Project– Mapping mostly used yeast artificial chromosome (YAC),

accepts million base pairs– Sequencing used bacterial artificial chromosomes (BAC)

accepts about 300,000 bp• BACs are more stable, easier to work with than YACs

Figs. 7, 8

BACYAC

Page 12: Chapter 24 topics: Genomics, Proteomics, Bioinformatics

24-12

A. Clone-by-Clone Strategy• Mapping requires set of physical landmarks to

relate positions of cloned genes, then sequence• Some markers are genes; many are nameless

stretches of DNA (must organize it all)– RFLPs – want polymorphic regions

• Ideally different pattern for people with disease vs. normal people locates disease genes (like HD)

– VNTRs, variable number tandem repeats of small seq.• Mini-satellite, Highly polymorphic, useful for forensics

– STSs, sequence-tagged unique sites, expressed-sequence tags and microsatellites

Page 13: Chapter 24 topics: Genomics, Proteomics, Bioinformatics

24-13

Sequence-Tagged Sites- physical maps

Fig. 9

• STSs unique sequences– 60-1000 bp long– Detectable by PCR

• Need sequence information for primers;

• Need not be in a gene• Design short primers

– Hybridize few hundred bp apart

– Amplify predictable length of DNA – see on gel

Page 14: Chapter 24 topics: Genomics, Proteomics, Bioinformatics

24-14

Sequence-Tagged Sites - Physical Maps

Fig. 10

Align cloned sequences to form contigs (contiguous overlapping DNA sequences)

Page 15: Chapter 24 topics: Genomics, Proteomics, Bioinformatics

24-15

Shotgun-Sequencing Method used by Celera

Fig. 11: Connect overlapping BAC clones by identification of STCs, sequence-tagged connectors

Page 16: Chapter 24 topics: Genomics, Proteomics, Bioinformatics

24-16

Human Genome Project

• Working draft (2001) reported by Venter (Celera) and NIH/DOE consortium:

• Estimated genome contained fewer genes than anticipated – 25,000 to 30,000

• 2007 completed version

• About half of genome from action of transposons• Bacteria also donated dozens of genes• Provides information about human evolution: chimpanzee, Neanderthal, many other genomes

Page 17: Chapter 24 topics: Genomics, Proteomics, Bioinformatics

24-17

Findings from Chromosome 22 – 1st one679 annotated genes:

– 274 Known genes, previously identified– 150 Related genes, homologous to known genes– 134 Pseudogenes, sequences homologous to known

genes, but defects preclude proper expression

Coding regions of genes only a tiny fraction• Annotated genes 39% of total length• Exons only 3% • Repeat sequences (Alu, LINEs, etc) are 41%

Large chunks of human chromosome 22q conserved in several different mouse chromosomes

Page 18: Chapter 24 topics: Genomics, Proteomics, Bioinformatics

24-18

Homologs• Orthologs: homologous genes in

different species evolved from common ancestor:– 8 regions to 7 mouse chromosomes

• Paralogs: homologous genes that evolved by gene duplication within a species

• Homologs: any kind of homologous genes, both orthologs and paralogs

Fig. 13 Large chunks of human chromosome 22q conserved in several different mouse chromosomes (113 genes)

Page 19: Chapter 24 topics: Genomics, Proteomics, Bioinformatics

24-19

Chromosome 21• Relative few genes

– 225 genes– 59 pseudogenes

• All 24 genes shared with mouse chromosome 10 are in same order in both chromosomes

• Disease genes associated with chromosome 21:– Down syndrome is extra chromosome– Alzheimer’s, ALS (Lou Gehrig’s disease) genes

Page 20: Chapter 24 topics: Genomics, Proteomics, Bioinformatics

24-20

The X Chromosome• Sequence of 151 Mb of human X chromosome:

- 1098 protein-encoding genes– 168 genes governing X-linked phenotype– Genes for 173 noncoding RNAs– Lot of genes identified for human disease (sex-linked)

• Chromosome rich in LINE1 repetitive elements– Involved in X inactivation mechanism in female cells

• XIST RNA (X-inactivation specific)– 32-kb RNA responsible for X-inactivation, heterochromatin

X (and partner Y) evolved from ancestral autosomes

Page 21: Chapter 24 topics: Genomics, Proteomics, Bioinformatics

24-21

Other Vertebrate Genomes

Fig. 14 Mouse, human

• Comparing human genome with other vertebrates:– helped identify many human genes– help identify defective genes for

human genetic diseases

• Closely related species (mouse) identify when and where genes are expressed; predict when and where human genes likely expressed

Page 22: Chapter 24 topics: Genomics, Proteomics, Bioinformatics

24-22

The Minimal Genome – J. Craig Venter• Define essential gene set of simple organism

– Mutate one gene at a time; see which required for life• In theory, could define minimal genome: set of

genes required for life– Minimum genome likely larger than essential gene set• Sequence a small genome, then delete genes Mycoplasma genitalium, 580 kb (480 protein-coding genes)• No cell wall, intracellular parasite, only glycolysis

• 2010 placed synthetic minimal genome (1 x 106 bp) into Mycoplasma cell lacking genes :– new life form that can live and reproduce under lab

conditions – controversial approaches

Page 23: Chapter 24 topics: Genomics, Proteomics, Bioinformatics

24-23

The Barcode of Life• CBOL (Consortium for the Barcode of Life: plan to

create barcode to identify any species of life on earth• First such barcode - sequence of 648-bp piece of

mitochondrial COI gene from each organism– Cytochrome C oxidase– Isolate mitochondrial DNA, sequence

• Sequence can uniquely identify most organisms

• Other sequences needed for plants and bacteria, since less variation among their COI genes

Page 24: Chapter 24 topics: Genomics, Proteomics, Bioinformatics

24-24

24.3 Applications : Functional Genomics

• Functional genomics deals with function or expression of genomes

• Transcriptome: all transcripts an organism makes at any given time

• Genomic functional profiling: use of genomic information to block expression systematically

• Proteomics: study structures and functions of protein products of genomes

Page 25: Chapter 24 topics: Genomics, Proteomics, Bioinformatics

24-25

Transcriptomics• Study all transcripts organism makes• Create DNA microarrays (microchips) that hold

1000s of cDNAs or oligos – Hybridize labeled RNAs (cDNAs) from cells to chips– Intensity of hybridization at each spot reveals the extent

of expression of corresponding gene• Arrays measure expression of many genes at once• Clustered expression of genes in time and space

suggests products of these genes collaborate in some process -> function

• Affymetrix makes chips, 25-mer unique sequences

Page 26: Chapter 24 topics: Genomics, Proteomics, Bioinformatics

24-26

DNA chips: Oligo-nucleotides on a Glass Substrate

Fig. 16

Fig. 17

Serum-starved human cells cDNA (labeled green); serum-fed cells cDNA (red)Equal expression of mRNA = yellow

Page 27: Chapter 24 topics: Genomics, Proteomics, Bioinformatics

24-27

Genomic Functional Profiling

– Deletion analysis - mutants created by replacing genes with antibiotic resistance gene flanked by oligomers serving as barcode for that mutant

– Functional profile can be obtained by growing whole group of mutants together under various conditions to see which mutants disappear most rapidly

Fig. 21 Growth of yeast mutants on galactose C source

Page 28: Chapter 24 topics: Genomics, Proteomics, Bioinformatics

24-28

RNAi Analysis• Genomic functional analysis: RNAi inactivates genes• Ex. genes involved in early embryogenesis in C. elegans:

– 661 important genes (early embryo defect)– 326 involved in embryogenesis

Fig. 22: initial screen showed which genes were mutated with RNAi;Then see which stage of embryogenesis affected

Page 29: Chapter 24 topics: Genomics, Proteomics, Bioinformatics

24-29

* Locating Target Sites for Transcription Factors (ChIP-chip)

• Chromatin immunoprecipitation (ChIP) followed by DNA microarray analysis can identify DNA-binding sites for activators and other proteins

• Small genome organisms - all intergenic regions can be included in microarray

• If genome is large, not practical• To narrow areas of interest, can use CpG islands

– Non-methylated CpG associated with gene control region– If timing/conditions of activator’s activity are known, control

regions of genes known to be activated at those times, or under those conditions, can be used

Page 30: Chapter 24 topics: Genomics, Proteomics, Bioinformatics

24-30

ChIP-chip assays locate target sites for specific transcription factors

Fig. 24

• ChIP with specific antibody• PCR adding generic primer to all• fluorescent label • microarray

•See Fig. 25 Yeast Gal4 protein binding sites

Page 31: Chapter 24 topics: Genomics, Proteomics, Bioinformatics

24-31

In Situ Expression Analysis

‘Mouse blots’

• Mouse as human surrogate in large-scale expression studies (ethically impossible in humans)

• Studied expression of almost all mouse orthologs of genes on human chromosome 21– Followed stages of embryonic development (E)– Catalogued embryonic tissues in which genes expressed

Fig. 26

Page 32: Chapter 24 topics: Genomics, Proteomics, Bioinformatics

24-32

Single-Nucleotide Polymorphisms; pharmacogenomics

• Single-nucleotide polymorphisms (SNPs) are single bp differences between people; account for many genetic conditions caused by single genes, even multiple genes

• Might be able to predict response to a drug• New focus for therapeutics

• Haplotype map with > 1 million SNPs: sort out important SNPs from those with no effect

Page 33: Chapter 24 topics: Genomics, Proteomics, Bioinformatics

24-33

24.4 Proteomics• Proteome: all proteins produced by an organism• Proteomics: Study of all proteins, or subsets• More accurate picture of gene expression than

transcriptomics studies:– Sometimes mRNA is degraded, not translated

• First separate proteins, often on massive scale– 2-D gel electrophoresis is good tool

• After separation, identify proteins– Digest proteins with proteases– Identify peptides by mass spectrometry

Page 34: Chapter 24 topics: Genomics, Proteomics, Bioinformatics

24-34

MALDI-TOF Mass Spectrometry

Fig. 27

Matrix-assisted laser desorption ionization – time of flightPeptides ionized; time to reach detector accurately reflects mass

Page 35: Chapter 24 topics: Genomics, Proteomics, Bioinformatics

24-35

Detecting Protein-Protein Interactions

Fig. 28

Epitope tag on one protein (from gene level) permits isolation of complex containing that protein using affinity resins

Common epitopes: His6-tag, HA- tagFlag-tag, TAP-tag

** In future, microchips with antibodies may allow analysis of proteins in complex mixtures without separation

Page 36: Chapter 24 topics: Genomics, Proteomics, Bioinformatics

24-36

Identifying Protein Interactions, networks• Most proteins function with other proteins• Yeast two-hybrid analysis• Protein microarrays• Immunoaffinity chromatography with mass spectrometry

Fig. 29. Identifying proteins binding kinases using Flag-tagged KssI or Cdc28

Page 37: Chapter 24 topics: Genomics, Proteomics, Bioinformatics

24-37

24.5 Bioinformatics• Bioinformatics: building and using biological databases

– DNA sequences of genomes– mining massive amounts of biological data for meaningful

knowledge about gene structure and expression

• National Center for Biological Information (NCBI) website: vast store of biological information (genomic and proteomic)

• Start with DNA sequence, discover gene, then compare that sequence with that of similar genes or organisms

• View 3D protein structures on computer

Page 38: Chapter 24 topics: Genomics, Proteomics, Bioinformatics

Review questions

2. What kind of mutation gave rise to Huntington disease?

12. Compare/ contrast the clone-by-clone sequencing strategy with the shotgun sequencing strategy for large genomes

15. The pufferfish genome is nine times smaller than human genome, but contains as many genes. How can that be?

20. Describe hypothetical experiment using DNA microarray to measure transcription from SV40 viral genes at two stages of infection of cells by the virus. Show example results.

24-38