Theory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple...
Preview:
Citation preview
- Slide 1
- Theory and Application of Multiple Sequence Alignments Brett
Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to
Make One, and What to Do With It
- Slide 2
- History Structure of DNA discovered (1953) First (phage) genome
determined in 1977 Human genome project begun in 1990 First living
organism (H.i.) sequenced in 1995 Human Rough draft completed in
2000 NHGRI (public) vs. J. Craig Venter (private) Used super
computer to put human genome together in right order
- Slide 3
- What is a Genome? Genetic material required for organism to
replicate Eukaryotes (Humans): # chromosomes Prokaryotes
(Bacteria): 1 chromosome Viruses: whats a chromosome? 10 trillion
cells in human body X 2m = 3.2 Gb 780,000 times around Earth 67.8
roundtrips to the sun Bacteria (580 kb- 10 Mb) Virus (3.5 kb 1.3
Mb) http://www.rsc.org/chemsoc/timeline/pages/2001.html
- Slide 4
- Why are Genomes so Important? Encode all organismal functions
DNA -> RNA -> protein Unique to each organism Find
differences (mutations) only by comparing genomes with each other
www.thednastore.com/images/cells/mrdna1.jpg
- Slide 5
- How are Sequences Made? 1.Make lots of copies of original
sequence (PCR) 2.Put the copies into a machine to make even more
copies 3.Fluorescent (glow-in-the-dark) bases get incorporated
randomly into new DNA molecule 4.Laser detects glowing bases and
tells the computer the order of bases = sequence
http://bjpsbiotech.edublogs.org/files/2007/12/electropherogram.jpg
- Slide 6
- Whats the Next Step? After sequence is determined, then what?
Make sense of it by comparing with other related (homologous)
sequences Multiple Sequence Alignment
- Slide 7
- What is an Alignment? Lining up related (homologous) positions
Allows comparison Unaligned Aligned
- Slide 8
- Comparing Sequences (Genomes) All DNA contains a unique genetic
fingerprint Similarity reveals Related function Shared evolutionary
history education.vetmed.vt.edu/.../FINGERPRINT.jpg
- Slide 9
- Aligning with Computational Methods Computers cant see patterns
Use math to find best alignment by assigning scores Match Mismatch
Gap Internal Insertion / deletion (indel) Terminal Missing
information?
- Slide 10
- What is a Gap? Allows bases to be lined up even if sequences
are different lengths Insertions / deletions (indels) Impossible to
tell which sequence has lost (gained) information Terminal gaps
Sequence is either naturally shorter or artificially cutoff
- Slide 11
- MismatchesGaps Nucleotide Alignment Custom Scores Match
Mismatch Gap-opening penalty Penalized for not having letter (begin
a gap) Why? Gap-extension penalty Little or no penalty for
lengthening a gap Why? Scores balance between mismatch &
gap
- Slide 12
- Dynamic Programming Used to calculate alignment Breaks a very
complicated process into smaller steps Helps computers to solve the
problem faster Sequence 1 Sequence 2 Math Read
http://www.myspacepimper.com/images/232763/Disney-s-Goofy-Baking-a-Cake.htm
- Slide 13
- Manual Alignment SequenceAATC 00000 A 0 -4 5 -4 5 1 5 -4 5 1 -2
-4 1 -3 -2 -4 -2 T 0 -4 -2 1 1 -3 3 1 3 -1 10 -3 10 6 -1 -6 6 C 0
-4 -2 -3 -2 -6 -1 -1 -5 1 6 6 2 15 2 15 Match = 5 Mismatch = -2 Gap
Opening = -4 Gap Extension = 0 Traceback: Follow the highest scores
back to the beginning Up or sideways = gap, diagonal = homology
(line up) AAAA A-A- TTTT CCCC
- Slide 14
- Computer-Generated Alignment Much faster than we are 2 GHz = 2B
calculations per second Dont get tired, make mistakes, or get
handcramps
- Slide 15
- Alignment Process
- Slide 16
- Types of Alignment Global Aligns entire sequence Permits gaps
Forced even if sequences not homologous Local Aligns longest region
possible with minimal (no) gaps
- Slide 17
- Beware! The computer is not always right Alignments Optimal:
highest score True: evolutionarily correct Can be improved Hard for
computer to accurately place indels (gaps) Apply prior
knowledge--codons - AAA CCC Lys Pro AA- ACC C ??? Thr ? Asn Lys vs.
Nucleotide Sequence Amino Acid Sequence
- Slide 18
- BLAST Basic Local Alignment Search Tool Most frequently used
alignment tool Local alignment of 1 sequence (query) against all
known sequences (subjects) in database Uses a heuristic to reduce
number of sequences it actually has to align Like using Google to
find most homologous sequences
- Slide 19
- BLAST Input
- Slide 20
- BLAST Output
- Slide 21
- How Does This Impact Me? Human Microbiome project Sequence all
bacteria in intestines Millions of bacteria in each gram of
excrement Which ones make us sick? How different is flora between
people? Ocean Virus Metagenomics project Try to get an idea of
virus diversity across the globe Boat goes around N.A. collecting
samples Billions of viruses in each gallon of seawater
- Slide 22
- How Does This Impact Me (contd)? Used to take swabs, grow
colonies on agar Antimicrobial resistance in turkeys Sequencing
removes middle step How to quickly assign genus and species to new
sequences? BLAST Project: New Phage from ponds
- Slide 23
- Other Uses for Alignments
- Slide 24
- SNP Detection Single Nucleotide Polymorphism Genetic changes
occurring in at least one sequence May have biological significance
Antibiotic resistance Changes could avoid detection by immune
system Cause of genetic disease (CF)
- Slide 25
- Phylogenetic Trees Computer generated by: Examining alignment
Looking for shared mutations Show relationship(s) between sequences
History of sequences Where they came from Genetic changes that have
occurred Clade Node Leaf iOS Phylogram App (Free) Branch
- Slide 26
- Recombination Can occur in all types of organisms Eukaryotes
Prokaryotes Viruses May change characteristic of organism Make you
sick (or not) Not recognized by immune system Fast way of getting
lots of genetic changes Breakpoint RdRP Genome 1 Genome 2 Daughter
Sequence Major Parent Minor Parent
- Slide 27
- Reassortment Chromosomes (segments) from one organism replace
those from another May change characteristic of organism Make you
sick (or not) Not recognized by immune system Fast way of getting
lots of genetic changes + =
- Slide 28
- Other Analysis Options Align Sequences Look for genetic changes
(genotype) that are associated with traits (phenotype) Host How
sick it makes you Drug resistance Inherited disease Do any
mutations consistently accompany the traits? Genome Wide
Association Studies http://lovestats.wordpress.com/dman/
- Slide 29
- Slide 30
- Slide 31
- How Does an Alignment Get a Score? Amino acids Identical
>> Similar >> Dissimilar
- Slide 32
- Score Lookup Table (Matrix) Symmetrical Positive Scores on
Diagonal (Matches) Some Mismatches get Negative Scores Some
Mismatches dont