Theory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple...
32
Theory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to Make One, and What to Do With It
Theory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to Make One, and What to Do
Theory and Application of Multiple Sequence Alignments Brett
Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to
Make One, and What to Do With It
Slide 2
History Structure of DNA discovered (1953) First (phage) genome
determined in 1977 Human genome project begun in 1990 First living
organism (H.i.) sequenced in 1995 Human Rough draft completed in
2000 NHGRI (public) vs. J. Craig Venter (private) Used super
computer to put human genome together in right order
Slide 3
What is a Genome? Genetic material required for organism to
replicate Eukaryotes (Humans): # chromosomes Prokaryotes
(Bacteria): 1 chromosome Viruses: whats a chromosome? 10 trillion
cells in human body X 2m = 3.2 Gb 780,000 times around Earth 67.8
roundtrips to the sun Bacteria (580 kb- 10 Mb) Virus (3.5 kb 1.3
Mb) http://www.rsc.org/chemsoc/timeline/pages/2001.html
Slide 4
Why are Genomes so Important? Encode all organismal functions
DNA -> RNA -> protein Unique to each organism Find
differences (mutations) only by comparing genomes with each other
www.thednastore.com/images/cells/mrdna1.jpg
Slide 5
How are Sequences Made? 1.Make lots of copies of original
sequence (PCR) 2.Put the copies into a machine to make even more
copies 3.Fluorescent (glow-in-the-dark) bases get incorporated
randomly into new DNA molecule 4.Laser detects glowing bases and
tells the computer the order of bases = sequence
http://bjpsbiotech.edublogs.org/files/2007/12/electropherogram.jpg
Slide 6
Whats the Next Step? After sequence is determined, then what?
Make sense of it by comparing with other related (homologous)
sequences Multiple Sequence Alignment
Slide 7
What is an Alignment? Lining up related (homologous) positions
Allows comparison Unaligned Aligned
Slide 8
Comparing Sequences (Genomes) All DNA contains a unique genetic
fingerprint Similarity reveals Related function Shared evolutionary
history education.vetmed.vt.edu/.../FINGERPRINT.jpg
Slide 9
Aligning with Computational Methods Computers cant see patterns
Use math to find best alignment by assigning scores Match Mismatch
Gap Internal Insertion / deletion (indel) Terminal Missing
information?
Slide 10
What is a Gap? Allows bases to be lined up even if sequences
are different lengths Insertions / deletions (indels) Impossible to
tell which sequence has lost (gained) information Terminal gaps
Sequence is either naturally shorter or artificially cutoff
Slide 11
MismatchesGaps Nucleotide Alignment Custom Scores Match
Mismatch Gap-opening penalty Penalized for not having letter (begin
a gap) Why? Gap-extension penalty Little or no penalty for
lengthening a gap Why? Scores balance between mismatch &
gap
Slide 12
Dynamic Programming Used to calculate alignment Breaks a very
complicated process into smaller steps Helps computers to solve the
problem faster Sequence 1 Sequence 2 Math Read
http://www.myspacepimper.com/images/232763/Disney-s-Goofy-Baking-a-Cake.htm
Slide 13
Manual Alignment SequenceAATC 00000 A 0 -4 5 -4 5 1 5 -4 5 1 -2
-4 1 -3 -2 -4 -2 T 0 -4 -2 1 1 -3 3 1 3 -1 10 -3 10 6 -1 -6 6 C 0
-4 -2 -3 -2 -6 -1 -1 -5 1 6 6 2 15 2 15 Match = 5 Mismatch = -2 Gap
Opening = -4 Gap Extension = 0 Traceback: Follow the highest scores
back to the beginning Up or sideways = gap, diagonal = homology
(line up) AAAA A-A- TTTT CCCC
Slide 14
Computer-Generated Alignment Much faster than we are 2 GHz = 2B
calculations per second Dont get tired, make mistakes, or get
handcramps
Slide 15
Alignment Process
Slide 16
Types of Alignment Global Aligns entire sequence Permits gaps
Forced even if sequences not homologous Local Aligns longest region
possible with minimal (no) gaps
Slide 17
Beware! The computer is not always right Alignments Optimal:
highest score True: evolutionarily correct Can be improved Hard for
computer to accurately place indels (gaps) Apply prior
knowledge--codons - AAA CCC Lys Pro AA- ACC C ??? Thr ? Asn Lys vs.
Nucleotide Sequence Amino Acid Sequence
Slide 18
BLAST Basic Local Alignment Search Tool Most frequently used
alignment tool Local alignment of 1 sequence (query) against all
known sequences (subjects) in database Uses a heuristic to reduce
number of sequences it actually has to align Like using Google to
find most homologous sequences
Slide 19
BLAST Input
Slide 20
BLAST Output
Slide 21
How Does This Impact Me? Human Microbiome project Sequence all
bacteria in intestines Millions of bacteria in each gram of
excrement Which ones make us sick? How different is flora between
people? Ocean Virus Metagenomics project Try to get an idea of
virus diversity across the globe Boat goes around N.A. collecting
samples Billions of viruses in each gallon of seawater
Slide 22
How Does This Impact Me (contd)? Used to take swabs, grow
colonies on agar Antimicrobial resistance in turkeys Sequencing
removes middle step How to quickly assign genus and species to new
sequences? BLAST Project: New Phage from ponds
Slide 23
Other Uses for Alignments
Slide 24
SNP Detection Single Nucleotide Polymorphism Genetic changes
occurring in at least one sequence May have biological significance
Antibiotic resistance Changes could avoid detection by immune
system Cause of genetic disease (CF)
Slide 25
Phylogenetic Trees Computer generated by: Examining alignment
Looking for shared mutations Show relationship(s) between sequences
History of sequences Where they came from Genetic changes that have
occurred Clade Node Leaf iOS Phylogram App (Free) Branch
Slide 26
Recombination Can occur in all types of organisms Eukaryotes
Prokaryotes Viruses May change characteristic of organism Make you
sick (or not) Not recognized by immune system Fast way of getting
lots of genetic changes Breakpoint RdRP Genome 1 Genome 2 Daughter
Sequence Major Parent Minor Parent
Slide 27
Reassortment Chromosomes (segments) from one organism replace
those from another May change characteristic of organism Make you
sick (or not) Not recognized by immune system Fast way of getting
lots of genetic changes + =
Slide 28
Other Analysis Options Align Sequences Look for genetic changes
(genotype) that are associated with traits (phenotype) Host How
sick it makes you Drug resistance Inherited disease Do any
mutations consistently accompany the traits? Genome Wide
Association Studies http://lovestats.wordpress.com/dman/
Slide 29
Slide 30
Slide 31
How Does an Alignment Get a Score? Amino acids Identical
>> Similar >> Dissimilar
Slide 32
Score Lookup Table (Matrix) Symmetrical Positive Scores on
Diagonal (Matches) Some Mismatches get Negative Scores Some
Mismatches dont