52
CONCEPT OF SEQUENCE COMPARISON Natapol Pornputtapong 18 January 2018

CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

CONCEPT OF SEQUENCE COMPARISON

Natapol Pornputtapong

18 January 2018

Page 2: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

SEQUENCE ANALYSIS - A ROSETTASTONE OF LIFE“Sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of a wide range of analytical methods to understand its features, function, structure, or evolution.”

— Wikipedia —

2Pravech Ajawatanawong, Faculty of Science, Mahidol University

Page 3: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

COMPARING SEQUENCES

• cornerstone in sequence analysis

• aims for identification of sequence relatedness

• ONLY “homologous sequences” (derived from the same ancestor) can be compared

• homologous sequences should (but not MUST) have similar function and similar sequences

3Pravech Ajawatanawong, Faculty of Science, Mahidol University

Page 4: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

HOMOLOGY IN STRUCTURES

• reasons why structures have similar shapes are homology and homoplasy

• homology = shares the same ancestor

• homoplasy = similar structures but not derived from the same ancestor

4Pravech Ajawatanawong, Faculty of Science, Mahidol University

Page 5: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

HOMOLOGY IN SEQUENCES

5

ACTGTACTCGCATCG

ACTATACTCTCATTG ACTGTTCTCCCATCAspecies A species B

Pravech Ajawatanawong, Faculty of Science, Mahidol University

Page 6: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

DEGREES OF HOMOLOGY

Homology is qualitative!

• Paralog: homologous genes have diverged from each other after gene duplication

• Ortholog: Genes originating from a single ancestral gene

• Xenolog: Homologous genes acquired via Horizontal Gene Transfer (HGT)

6Pravech Ajawatanawong, Faculty of Science, Mahidol University; Koonin (2005) Annu. Rev. Genet

Page 7: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

SEQUENCE ALIGNMENT

ACTATACTCTCATTG

ACTGTTCTCCCATCA

7

Page 8: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

DOT PLOT

8

ACCTCGTGCA

ACTTAGTCCA

sequence 1

sequence 2

A TT A CTG C AC

A

T

C

C

T

A

C

C

G

G

Sequence 2

Sequ

ence

1

sequence 1 ACCT-CGTGC-A

|| | || | |

sequence 2 AC-TTAGT-CCA

Pravech Ajawatanawong, Faculty of Science, Mahidol University

Page 9: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

DOT PLOT

• too many dots (high background) = no information

• How can we handle this problem?

9

seq_1

seq

_2

Pravech Ajawatanawong, Faculty of Science, Mahidol University

Page 10: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

GENERAL PARAMETERS FOR DOT PLOT• Window size = subsequence length

• Window sliding = rate of moving window

• Threshold or mismatch = cut off (normally use similarity score as the cut off)

10

TGAATCCCAGTTCAGCTCTTCAGCCTTTCGTGGATAAGAGAAGGCTGAAAGCGGGTCACGTTTTG

TAAATGGCAGTACAGCTGTTAGGCCCATCGTGGCTAAGATCAGGCTCCAAATAGGTCCAGTTCCC

window size

70% 70% 80%

Pravech Ajawatanawong, Faculty of Science, Mahidol University

Page 11: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

PRACTICAL HINTS FOR DOT PLOT

• a window of 10-20 residues is a good place to start

• comparative very large sequences (>30 to about 100 residues) may be useful.

• a good practical rule is to makes plots that have 3–5 times as many dots as the length of the sequences (e.g., 3000-5000 dots for a 1000 base sequence)

11Pravech Ajawatanawong, Faculty of Science, Mahidol University

Page 12: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

DOT PLOT

12

Sequence 1

Sequ

ence

2

horizontal offsets(indels)

Pravech Ajawatanawong, Faculty of Science, Mahidol University

Page 13: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

INTERPRETATION OF DOT PLOT (1)

highly similar

• single diagonal line

• needs noise (or background) reduction

13

sequence 1

sequ

ence

2

Pravech Ajawatanawong, Faculty of Science, Mahidol University

Page 14: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

INTERPRETATION OF DOT PLOT (2)

domain identification

14

sequence 1

sequ

ence

2

Pravech Ajawatanawong, Faculty of Science, Mahidol University

Page 15: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

EXON AND INTRON

15http://myhits.isb-sib.ch/util/dotlet

Page 16: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

INTERPRETATION OF DOT PLOT (3)

inversion

16

sequence 1

sequ

ence

2

Pravech Ajawatanawong, Faculty of Science, Mahidol University

Page 17: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

INTERPRETATION OF DOT PLOT (4)

17

sequence 1

sequ

ence

2

repeat

Pravech Ajawatanawong, Faculty of Science, Mahidol University

Page 18: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

REPEATED PROTEIN DOMAINS

18http://myhits.isb-sib.ch/util/dotlet

Page 19: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

INTERPRETATION OF DOT PLOT (5)

19

sequence 1

sequ

ence

2

palindromic sequence

Pravech Ajawatanawong, Faculty of Science, Mahidol University

Page 20: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

TERMINATORS AND OTHER STEM-LOOP STRUCTURES

20http://myhits.isb-sib.ch/util/dotlet

Page 21: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

INTERPRETATION OF DOT PLOT (6)

21

sequence 1

sequ

ence

2

low complexity regions

AAAAAAAAAAAAAA

Pravech Ajawatanawong, Faculty of Science, Mahidol University

Page 22: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

LOW-COMPLEXITY REGIONS

22

Plasmodium falciparum serine-repeat antigen protein precursor

http://myhits.isb-sib.ch/util/dotlet

Page 23: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

GAPS IN ALIGNMENT

• gap has never exist in nature

• gaps make the comparison difficult

• gap in sequence alignment most likely is indel

• accuracy of alignment determines accuracy of indel

23

ACGTCTGATACGCCGTATCGTCTATCT

ACGTCTGAT---CCGTATCGTCTATCT

gap ~ indel(insertion/deletion)Pravech Ajawatanawong, Faculty of Science, Mahidol University

Page 24: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

SCORING PAIRWISE SEQUENCE ALIGNMENT FOR DNA SEQUENCES

• the easiest method to score is match scoring

• Normalized score

24

seq1 ATTCGTCGTAGCTAGGCTAA

||| | |||| | || |||

seq2 ATTGGCCGTACCATGGATAA

match = 14 positionsmismatch = 6 positionstotal length = 20 positions∴ similarity score = 70%

Pravech Ajawatanawong, Faculty of Science, Mahidol University

seq1 ATTCGTCGTAGCTAGGCTAA

||| | |||| | || |||

seq2 ATTGGCCGTACCATGGATAA

match = 14 positions∴ similarity score = 14

Page 25: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

SCORING PAIRWISE SEQUENCE ALIGNMENT FOR PROTEIN SEQUENCES

• idea = amino acids that have the same physicochemical property would not change the structure of protein

25

MAATPTVLLFWKLLDEVFMA

||+|| || ||||+|||||| 90% similarity

MAVTPLVLFFWKLVDEVFMA

MAATPTVLLFWKLLDEVFMA

|| || || |||| |||||| 80% identity

MAVTPLVLFFWKLVDEVFMA

Pravech Ajawatanawong, Faculty of Science, Mahidol University

Page 26: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

CONFUSING TERMS

Identity

• proportion of pairs of identical characters between 2 sequences

• strongly depends on how two sequences are aligned

Similarity

• proportion of pairs of similar characters between 2 sequences

• similarity is determined by substitution matrix

• strongly depends on how two sequences are aligned and matrix used

Homology

• two sequences are homologs if they have the same ancestor

• we cannot score homology (so yes or no ONLY)

26Pravech Ajawatanawong, Faculty of Science, Mahidol University

Page 27: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

ALIGNMENT EVENT AND MUTATION EVENT• Match -> no mutation

• Mismatch -> substitution

• Gap -> insertion/deletion (InDel)

27Pravech Ajawatanawong, Faculty of Science, Mahidol University

Page 28: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

SUBSTITUTION MUTATION IN DNA

28

T A C C T G A G C C A A

Tyr Leu Ser Gln

C T A

Leuoriginal DNA seq.

T A C C T G A G C T A A

Tyr Leu Ser

C T Anon-sense mutation

T A C C T C A G C C A A

Tyr Leu Ser Gln

C T A

Leusilent mutation

T A C C T G C G C C A A

Tyr Leu Arg Gln

C T A

Leumissense mutation

Pravech Ajawatanawong, Faculty of Science, Mahidol University

Page 29: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

NUCLEOTIDE SUBSTITUTION

• sequences that share the same common ancestor will gradually diverse

• very difficult to perform direct observation

• sequence divergence = proportion (p) of nucleotide sites that two sequences are different

29

ACTGTACTCGCATCG

ACTATACTCTCATTG

ACTGTTCTCCCATCA

Pravech Ajawatanawong, Faculty of Science, Mahidol University

Page 30: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

EMPIRICAL STUDIES OF AMINO ACID SUBSTITUTION• several studies observed of the amino acid

substitution—results show that amino acid substitution is not random

• amino acids with similar chemical properties are more often to substitute in the sequence

• some amino acids (e.g., cysteine, glycine and tryptophan) are rarely changed

30Pravech Ajawatanawong, Faculty of Science, Mahidol University

Page 31: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

POINT ACCEPTED MUTATION (PAM)• proposed in 1978 by Margaret Oakley Dayhoff• the first substitution matrix for amino acid changes• one PAM is a unit of evolutionary divergence in which

1% of amino acids have been changed• if no selection for fitness (impossible!!), substitution is

one of the main factors that drive the protein sequence change

• under observation of related protein sequences, frequencies of amino acid substitutions are biased—prone to maintain the function of protein

• these are the point mutations that have been “accepted” during evolution

31Pravech Ajawatanawong, Faculty of Science, Mahidol University

Page 32: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

PAM 250 MATRIX

• the 1 PAM unit was constructed from the observation of amino acid changes in closely related proteins

• the data of one PAM was then extrapolated to PAM250

• only PAM250 was published by Dayhoff et al. (1978)

• higher PAM matrix is good for highly divergent sequences; lower PAM is good for conserved sequences

32Pravech Ajawatanawong, Faculty of Science, Mahidol UniversityBIOINFORMATICS A Practical Guide to the Analysis of Genes and Proteins

Page 33: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

BLOSUM MATRIX

• observed amino acid changes by different strategy with PAM matrix construction

• sequence data are derived from BLOCKS database

• differ from PAM, BLOSUM used distantly related sequences (PAM used closely related sequences)

• BLOSUM62 matrix (the first BLOSUM matrix)—sequences having at least 62% identity are merged into a single sequence

• higher BLOSUM matrix (e.g., BLOSUM90) is good for comparing very similar sequences, the lower BLOSUM (e.g., BLOSUM30) is for highly divergent sequences

33Pravech Ajawatanawong, Faculty of Science, Mahidol University

Page 34: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

BLOSUM 62 MATRIX

• the 1 PAM unit was constructed from the observation of amino acid changes in closely related proteins

• the data of one PAM was then extrapolated to PAM250

• only PAM250 was published by Dayhoff et al. (1978)

• higher PAM matrix is good for highly divergent sequences; lower PAM is good for conserved sequences

34Pravech Ajawatanawong, Faculty of Science, Mahidol UniversityBIOINFORMATICS A Practical Guide to the Analysis of Genes and Proteins

Page 35: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

SUGGESTED USES FOR COMMON SUBSTITUTION MATRICES

35Menlove, Clement, and Crandall: Similarity Searching Using BLAST

Page 36: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

GAP PENALTY

• assumption = indel is rare (not easy to occur)

• gap opening = penalty when gap is introduced into the alignment

• gap extension = penalty of the large size of gap, normally count from the second position of gap

36

CCGTATCGTCTATCTACGTGCACTGAT

CCCAATCTTCAATCTACG---TCTGAT

gap opening gap extensionPravech Ajawatanawong, Faculty of Science, Mahidol University

Page 37: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

DYNAMIC PROGRAMMING

37Sean R Eddy Nature Biotechnology 22, 909 - 910 (2004)

Page 38: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

BLAST: BASIC LOCAL ALIGNMENT SEARCH TOOL

38Wishard, Introduction to Bioinformatics A theoretical and Practical Approach

Page 39: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

PAIRWISE SEQUENCES ALIGNMENT• aim for comparison of 2 sequences

• global alignment—try to do the best alignment of two sequences across the entire length

• local alignment—try to fine the highly similar region(s) between two sequences

• overlapping alignment—global alignment of two sequences with different sizes

39Pravech Ajawatanawong, Faculty of Science, Mahidol University

Page 40: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

GLOBAL ALIGNMENT

• end-to-end alignment

• may end up with a lot of gaps in the alignment if 2 sequences have dissimilar in size

• Not sensitive to the modular nature of proteins

• very sensitive to gap penalties (gap opening and gap extension)

• Needleman-Wunch algorithm (1970)

40

5' ACTACTAGATTACTTACGGATCAGGTACTTTAGAGGCTTGCAACCA 3'

||||||||||| ||||||| |||||||||||||| |||||||

5' ACTACTAGATT----ACGGATC--GTACTTTAGAGGCTAGCAACCA 3'

Pravech Ajawatanawong, Faculty of Science, Mahidol University

Page 41: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

LOCAL ALIGNMENT

• finds local regions with high level of similarity

• more sensitive to the modular nature of proteins

• can be used to search databases

• Smith-Waterman algorithm (1981)

41

ACTACTAGATTACTTACGGATCAGGTACTTTAGAGGCTTGCAACCA

||||||||||| ||||||| |||||||||||||| |||||||

ACTACTAGATT----ACGGATC--GTACTTTAGAGGCTAGCAACCA

ACTACTAGATT

|||||||||||

ACTACTAGATT

ACGGATC

|||||||

ACGGATC

GTACTTTAGAGGCTTGCAACCA

|||||||||||||| |||||||

GTACTTTAGAGGCTAGCAACCALocal Alignment

Global Alignment

Pravech Ajawatanawong, Faculty of Science, Mahidol University

Page 42: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

MULTIPLE SEQUENCE ALIGNMENT

42

Page 43: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

PROBLEM OF USING PAIRWISE ALIGNMENT• good for comparing of only two sequences

• hard to understand and interpret the alignment results when a number of sequences are >2

• less evolutionary meaning

43

ATGCTAGTAAGC

ATTCAA-T--GC

-TTCTAGC--GC

ATGCTAGTAAGC

ATTCAA-T--GC

ATTCAA-TGC

-TTCTAGCGC

ATGCTAGTAAGC

-TTCTAGC--GC

Pravech Ajawatanawong, Faculty of Science, Mahidol University

Page 44: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

MULTIPLE SEQUENCE ALIGNMENT (MSA)• most useful object in sequence analysis

• mid 1980s, MSA was generated by hand because dynamic programming (at that time) were slow when applied to >3 sequences

• idea—arrangement of the homologous residues (nucleotide or amino acid) in the same column

• provides more biological information than pairwise sequence alignment

44Pravech Ajawatanawong, Faculty of Science, Mahidol University

Page 45: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

MSA METHODS

• Exact method

• Progressive methods: Clustal, MUSCLE

• Iterative methods: MAFFT

• Consistency based methods: T-Coffee, ProbCons

• Structure based methods: 3D-Coffee

Multiple sequence alignment methods 45

Page 46: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

MSA METHODS

Sviatopolk-Mirsky Pais et al. (2014) Algorithm for Molecular Biology 46

Page 47: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

PROGRESSIVE ALIGNMENT

47

dynamic programming

Pravech Ajawatanawong, Faculty of Science, Mahidol University

Page 48: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

THE CLUSTAL SERIES

• Clustal was published by Thompson, et al. in 1994

• ClustalW, ClustalX

• Clustal algorithm were obsolete, but their algorithm is good for understanding the MSA

• algorithm—generated a guide tree, then,do a progressive alignment based on that guide tree

• Latest: Clustal Omega

48Pravech Ajawatanawong, Faculty of Science, Mahidol University

Page 49: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

MUSCLE ALIGNMENT PROGRAM

• MUltiple Sequence Comparison by Log-Expectation (MUSCLE)

• was published by Edgar RC, et al. in 2004

• step I: progressive alignment

• step II: improve progressive alignment

• step III: refinement

• very easy command line

• improved speed and accuracy (based on SP method)

Pravech Ajawatanawong, Faculty of Science, Mahidol University

49

Page 50: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

MUSCLE ALIGNMENT PROGRAM

50

Page 51: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

CHOOSING THE RIGHT MSA PROGRAM

Chagoyen M (2013) Sequence Analysis and Structure Prediction Service. 51

Page 52: CONCEPT OF SEQUENCE COMPARISON - Pharm CEpharmce.weebly.com/.../8/...of_sequence_comparison.pdf · SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE “Sequence analysis is the process

QUESTIONS?

52