Transcript

Introduction to Bioinformatics

Lecture 20:

Sequencing genomes

Nucleic Acid Basics

• Nucleic Acids Are Polymers

• Each Monomer Consists of Three Moieties: Nucleotide

A Base + A Ribose Sugar + A Phosphate

Nucleoside

• A Base Can be One of the Five Rings:

• Pyrimidines • Purines

•Pyrimidines and Purines can Base-Pair (Watson-Crick Pairs)

• Unlike three dimensional structures of proteins, DNA molecules assume simple double helical structures independent of their sequences. There are three kinds of double helices that have been observed in DNA: type A, type B, and type Z, which differ in their geometries. The double helical structure is essential to the coding function of DNA. Watson (biologist) and Crick (physicist) first discovered the double helix structure in 1953 by X-ray crystallography.

• RNA, on the other hand, can have as diverse structures as proteins, as well as simple double helix of type A. The ability of being both informational and diverse in structure suggests that RNA was the prebiotic molecule that could function in both replication and catalysis (The RNA World Hypothesis). In fact, some viruses encode their genetic materials by RNA (retrovirus)

Forces That Stabilize Nucleic Acid Double Helix

• There are two major forces that contribute to stability of helix formation– Hydrogen bonding in base-pairing

– Hydrophobic interactions in base stacking

5’

5’

3’

3’

Same strand stacking

cross-strand stacking

Types of DNA Double Helix

• Type A: major conformation of RNA, minor conformation of DNA;

• Type B: major conformation of DNA;• Type Z: minor conformation of DNA

5’

5’

3’

3’

5’

5’

3’

3’

5’

5’

3’

3’A B Z

Narrow tight

Wide Less tight

Left-handedLeast tight

Three Dimensional Structures of Double Helices

A-DNA

A-RNA

Major Groove

Minor Groove

A-DNA

Secondary Structures of Nucleic Acids

• DNA is primarily in duplex form.

• RNA is normally single stranded which can have a diverse form of secondary structures other than duplex.

More Secondary Structures of Nucleic Acids

Pseudoknots:

Source: Cornelis W. A. Pleij in Gesteland, R. F. and Atkins, J. F. (1993) THE RNA WORLD. Cold Spring Harbor Laboratory Press.

3D Structures of RNA: Transfer RNA Structures

AnticodonStem

D Loop

TC Loop

Variableloop

Anticodon Loop

Secondary Structureof tRNA

Tertiary Structureof tRNA

Ban et al., Science 289 (905-920),

2000

Secondary StructureOf large ribosomal RNA

Tertiary StructureOf large ribosome subunit

3D Structures of RNA: Ribosomal RNA Structures

rRNA Secondary Structure Based on Phylogenetic Data

DNA Sequencing

Chain Termination Method– Sanger, 1977– single stranded DNA, ~800b– Method:

• Electrophoresis can separate DNA molecules differing 1bp in length

• Dideoxynucleotide (ddNTP) are used - which stop replication

ddNucleotides

ddA, ddT, ddC, ddG Each type marked

with fluorescent dye When incorporated

into DNA chain –stops replication

Chain Termination Method, An Outline Replication

– Obtaining ssDNA– Add a (universal) primer

Start replication in a soup of A,T,C,G

Continously add tiny amounts of ddA, ddT, ddC, ddG– gradually stopping all the processes

Chain Termination Method,Reading the Sequence

Running through electrophoresis gel– Four types of ddNTP have four

different fluorescent labels– Automated reading

See: http://www.dnalc.org/Shockwave/cycseq.html

Chain Termination Method,Results

time

Sig

nal

fragment size

Electrophoresis and laser beam scanning

Electropherogram

Shotgun Method - Overview Cut genome into short fragments Sequence DNA fragments Create contigs

Contig - continous set of overlapping sequences

Gap!

Shotgun Method

The shotgun approach to sequence assembly. The DNA molecule is broken into small fragments, each of which is sequenced. The master sequence is assembled by searching for overlaps between the sequences of individual fragments. In practice, an overlap of several tens of base pairs would be needed to establish that two sequences should be linked together.

Shotgun Method – Contig Construction

Two DNA sequences:X=CTATCA

Y=AGTAT How do they overlap?

Try to apply dynamic programming

orX XY Y

Shotgun Method – Contig Construction by Dynamic Programming

2

1

Shotgun Method –Haemophilus Influenzae Sequencing

1.5-2kb

Extract DNA

Sonicate

ElectrophoresisDNA library

Sequence Construct contigs

Sequenced

Probe libraries

Shotgun Method - Filling in gaps

Contig ContigGap

ContigGap

Scaffold A series of sequence contigs separated by sequence gaps.

Shotgun Method - Pros and Cons

Pros– Human labour reduced to minimum

Cons– Computationally demanding – O(n2)

comparisons– High error rate in contig construction

• Repeats as the main problem

Shotgun Method Repeats as the main problem

Shotgun vs. Hierarchical Method

Celera vs. Human Genome Project Hierarchical (top-down) assembly:

– The genome is carefully mapped– “Shotgun” into large chunks of 150kb

• Exact location of each chunk is known

– Each piece is again “shotgunned” into 2kb and sequenced

Shotgun vs. Hierarchical Method

Shotgunbottom-up

Hierarchicaltop-down

New Sequencing Methods

Sequencing By Hybridization– Check which from all possible fragments of length

k (k-tuples) hybridize to the sequence

ATTCGTAAAAGAGC

TAAAAG

AGC

Wrapping up

Nucleotide, DNA, RNA basics (sequence, structure) DNA Sequencing

– Sanger method– Shotgun sequencing – Hierarchical assembly– Contigs, scaffolds, Dynamic Programming


Recommended