Introduction to Bioinformatics Lecture 20: Sequencing genomes

  • View
    216

  • Download
    2

Embed Size (px)

Text of Introduction to Bioinformatics Lecture 20: Sequencing genomes

  • Slide 1
  • Introduction to Bioinformatics Lecture 20: Sequencing genomes
  • Slide 2
  • Nucleic Acid Basics Nucleic Acids Are Polymers Each Monomer Consists of Three Moieties: Nucleotide A Base + A Ribose Sugar + A Phosphate Nucleoside A Base Can be One of the Five Rings:
  • Slide 3
  • Pyrimidines Purines Pyrimidines and Purines can Base-Pair (Watson-Crick Pairs)
  • Slide 4
  • Slide 5
  • Unlike three dimensional structures of proteins, DNA molecules assume simple double helical structures independent of their sequences. There are three kinds of double helices that have been observed in DNA: type A, type B, and type Z, which differ in their geometries. The double helical structure is essential to the coding function of DNA. Watson (biologist) and Crick (physicist) first discovered the double helix structure in 1953 by X-ray crystallography. RNA, on the other hand, can have as diverse structures as proteins, as well as simple double helix of type A. The ability of being both informational and diverse in structure suggests that RNA was the prebiotic molecule that could function in both replication and catalysis (The RNA World Hypothesis). In fact, some viruses encode their genetic materials by RNA (retrovirus)
  • Slide 6
  • Forces That Stabilize Nucleic Acid Double Helix There are two major forces that contribute to stability of helix formation Hydrogen bonding in base-pairing Hydrophobic interactions in base stacking 5 3 Same strand stacking cross-strand stacking
  • Slide 7
  • Types of DNA Double Helix Type A: major conformation of RNA, minor conformation of DNA; Type B: major conformation of DNA; Type Z: minor conformation of DNA 5 3 5 3 5 3 AB Z Narrow tight Wide Less tight Left-handed Least tight
  • Slide 8
  • Three Dimensional Structures of Double Helices A-DNA A-RNA Major Groove Minor Groove A-DNA
  • Slide 9
  • Secondary Structures of Nucleic Acids DNA is primarily in duplex form. RNA is normally single stranded which can have a diverse form of secondary structures other than duplex.
  • Slide 10
  • More Secondary Structures of Nucleic Acids Pseudoknots: Source: Cornelis W. A. Pleij in Gesteland, R. F. and Atkins, J. F. (1993) THE RNA WORLD. Cold Spring Harbor Laboratory Press.
  • Slide 11
  • 3D Structures of RNA: Transfer RNA Structures Anticodon Stem D Loop T C Loop Variable loop Anticodon Loop Secondary Structure of tRNA Tertiary Structure of tRNA
  • Slide 12
  • Ban et al., Science 289 (905-920), 2000 Secondary Structure Of large ribosomal RNA Tertiary Structure Of large ribosome subunit 3D Structures of RNA: Ribosomal RNA Structures rRNA Secondary Structure Based on Phylogenetic Data
  • Slide 13
  • DNA Sequencing Chain Termination Method Sanger, 1977 single stranded DNA, ~800b Method: Electrophoresis can separate DNA molecules differing 1bp in length Dideoxynucleotide (ddNTP) are used - which stop replication
  • Slide 14
  • ddNucleotides ddA, ddT, ddC, ddG Each type marked with fluorescent dye When incorporated into DNA chain stops replication
  • Slide 15
  • Chain Termination Method, An Outline Replication Obtaining ssDNA Add a (universal) primer Start replication in a soup of A,T,C,G Continously add tiny amounts of ddA, ddT, ddC, ddG gradually stopping all the processes
  • Slide 16
  • Chain Termination Method, Reading the Sequence Running through electrophoresis gel Four types of ddNTP have four different fluorescent labels Automated reading See: http://www.dnalc.org/Shockwave/cycseq.html
  • Slide 17
  • Chain Termination Method, Results time Signal fragment size Electrophoresis and laser beam scanning Electropherogram
  • Slide 18
  • Shotgun Method - Overview Cut genome into short fragments Sequence DNA fragments Create contigs Contig - continous set of overlapping sequences Gap!
  • Slide 19
  • Shotgun Method The shotgun approach to sequence assembly. The DNA molecule is broken into small fragments, each of which is sequenced. The master sequence is assembled by searching for overlaps between the sequences of individual fragments. In practice, an overlap of several tens of base pairs would be needed to establish that two sequences should be linked together.
  • Slide 20
  • Shotgun Method Contig Construction Two DNA sequences: X=CTATCA Y=AGTAT How do they overlap? Try to apply dynamic programming or XX YY
  • Slide 21
  • Shotgun Method Contig Construction by Dynamic Programming 2 1
  • Slide 22
  • Shotgun Method Haemophilus Influenzae Sequencing 1.5-2kb Extract DNA Sonicate Electrophoresis DNA library SequenceConstruct contigs Sequenced
  • Slide 23
  • Probe libraries Shotgun Method - Filling in gaps Contig Gap Contig Gap Scaffold A series of sequence contigs separated by sequence gaps.
  • Slide 24
  • Shotgun Method - Pros and Cons Pros Human labour reduced to minimum Cons Computationally demanding O(n 2 ) comparisons High error rate in contig construction Repeats as the main problem
  • Slide 25
  • Shotgun Method Repeats as the main problem
  • Slide 26
  • Shotgun vs. Hierarchical Method Celera vs. Human Genome Project Hierarchical (top-down) assembly: The genome is carefully mapped Shotgun into large chunks of 150kb Exact location of each chunk is known Each piece is again shotgunned into 2kb and sequenced
  • Slide 27
  • Shotgun vs. Hierarchical Method Shotgun bottom-up Hierarchical top-down
  • Slide 28
  • New Sequencing Methods Sequencing By Hybridization Check which from all possible fragments of length k (k-tuples) hybridize to the sequence ATTCG TAAAAGAGC TAA AAG AGC
  • Slide 29
  • Wrapping up Nucleotide, DNA, RNA basics (sequence, structure) DNA Sequencing Sanger method Shotgun sequencing Hierarchical assembly Contigs, scaffolds, Dynamic Programming

View more >