26
COFFEE: an objective function for multiple sequence alignments Wang Yi Computational Genomics Group Bioinformatics Institute

COFFEE: an objective function for multiple sequence alignments

  • Upload
    alaina

  • View
    30

  • Download
    0

Embed Size (px)

DESCRIPTION

COFFEE: an objective function for multiple sequence alignments. Wang Yi Computational Genomics Group Bioinformatics Institute. Why MSA. Multiple Sequence Alignments (MSA) are among the most important tools for analyzing biological sequences Useful for: Structure prediction - PowerPoint PPT Presentation

Citation preview

Page 1: COFFEE: an objective function for multiple sequence alignments

COFFEE: an objective function for multiple sequence alignments

Wang Yi

Computational Genomics Group

Bioinformatics Institute

Page 2: COFFEE: an objective function for multiple sequence alignments

Why MSA

• Multiple Sequence Alignments (MSA) are among the most important tools for analyzing biological sequences

• Useful for:– Structure prediction

– Phylogenetic analysis

– Function prediction

– Polymerase Chain Reaction (PCR) primer design

– And more…

Page 3: COFFEE: an objective function for multiple sequence alignments

What is COFFEE

• Consistency based Objective Function For alignmEnt Evaluation

• The COFFEE score reflects the level of consistency between a MSA and a library containing pairwise alignments of the same group of sequences

Page 4: COFFEE: an objective function for multiple sequence alignments

What is consistency?Why study consistency between MSA and pairwise alignment?

Page 5: COFFEE: an objective function for multiple sequence alignments

Why pairwise alignments

• MSA, unlike pairwise alignment, cannot guarantee optimality yet

• Pairwise alignments use dynamic programming to obtain optimal result

• While it is too expensive for MSA to adopt the same algorithm

• People try to exploit the optimality of pairwise alignment by progressively combine them into MSA

Page 6: COFFEE: an objective function for multiple sequence alignments

Pairwise alignments to MSA

• ClustalW is a widely recognized package among such attempts

• ClustalW generates a guide tree according to the distances between each pair of sequences

• Then it aligns all these sequences progressively, from the closest branches to the most distant ones

Page 7: COFFEE: an objective function for multiple sequence alignments

Problem with ClustalW

• Mistakes made at the beginning of this procedure are never corrected

• This problem stems from not considering the consistency between close pair and distant ones

Page 8: COFFEE: an objective function for multiple sequence alignments

Two solutions

• To solve this problem, we can do either:– Check the consistency between one pairwise

alignment and the rest of the library before the progressive alignment

– Or: after obtaining a MSA, check the consistency between each pair of residues with its counterpart in pairwise alignment library

Page 9: COFFEE: an objective function for multiple sequence alignments

Consistency Vs Consistency

• These two kinds of consistency are actually closely related:

• To increase the consistency between pairs will decrease the chance of inconsistency between a pair with its origin in the library

• T-COFFEE takes the first approach while COFFEE calculates the latter

Page 10: COFFEE: an objective function for multiple sequence alignments

A simple example

• Suppose we have four sequences:– SeqA: THE LAST FAT CAT– SeqB: THE FAST CAT– SeqC: THE VERY FAST CAT– SeqD: THE FAT CAT

• We make a pairwise alignment library of these sequences:

Page 11: COFFEE: an objective function for multiple sequence alignments

Compare the consistency

• SeqA THE LAST FAT CATSeqB THE FAST CAT ---

• SeqA THE LAST FA-T CATSeqC THE VERY FAST CAT

• SeqA THE LAST FAT CATSeqD THE ---- FAT CAT

• SeqB THE ---- FAST CAT SeqC THE VERY FAST CAT

• SeqB THE FAST CAT SeqD THE FA-T CAT

• SeqC THE VERY FAST CATSeqD THE ---- FA-T CAT

• SeqA THE LAST FA-T CATSeqB THE FAST CA-T ---SeqC THE VERY FAST CATSeqD THE ---- FA-T CAT

• Or SeqA THE LAST FA-T CATSeqB THE ---- FAST CAT

SeqC THE VERY FAST CATSeqD THE ---- FA-T CAT

Page 12: COFFEE: an objective function for multiple sequence alignments

How COFFEE works

• Create a library of pairwise alignment for each possible pairs of sequences

• Compare each pair of aligned residues in the MSA to its counterpart in the library

• The overall consistency score is equal to the number of pairs that occur in both MSA and the library, divided by the total number of pairs in MSA.

Page 13: COFFEE: an objective function for multiple sequence alignments

How COFFEE works

• To decrease the amount of noise produced by inaccurate pairwise alignments in the library, we set a weight for each of them

• The weight equals the percent identity between the alignment

• For example: SeqA THE LAST FAT CATSeqB THE FAST CAT ---

• The weight is 8/13*100%=61.5%

Page 14: COFFEE: an objective function for multiple sequence alignments

The idea of weight

• The lower the weight (the more mismatches in the pairwise alignment), the more distant these two sequences are, and the less necessary we need to keep such pair in MSA.

• Therefore, with weight taken into mind we can keep the consistency only when it’s necessary

Page 15: COFFEE: an objective function for multiple sequence alignments

COFFEE Score

• Aij is the pairwise projection of sequences i and j

obtained from a MSA

• Len(Aij) is the length of Aij

• Wij is the weight of pairwise alignment on sequences

i and j in the library

• Score(Aij) is the number of aligned pairs of residues

that are shared between Aij and the library

N

i

N

ijijij

N

i

N

ijijij

ALenW

AScoreW

1

1

)(*

)(*

Page 16: COFFEE: an objective function for multiple sequence alignments

Features of COFFEE

• There is no gap penalty, since they are already contained in the library

• The score is normalized by the value of maximum score, thus it’s between 0 and 1

• The cost of substitution is made position dependent, i.e., we tolerate mismatch that already occurred in the library

Page 17: COFFEE: an objective function for multiple sequence alignments

Comments on COFFEE

Page 18: COFFEE: an objective function for multiple sequence alignments

Position-specific issue

• The current objective function is not position-specific enough

• It applies general weights in the whole pairwise alignments instead of functional parts

• Even very close alignment has non-functional parts, which contain more mismatches

Page 19: COFFEE: an objective function for multiple sequence alignments

Distant and close alignments

• A close alignment example:– THE –FIRST GULF WAR IS FOR JUSTICE||| || |||| ||| || ||| |THE THIRD- GULF WAR IS FOR ---OIL–

• A distant alignment example:– GO ATTACK THIS WEAK BUT EVIL IRAQ-- || |||| DUN TOUCH THE ARMED AND EVIL NKOREA

Page 20: COFFEE: an objective function for multiple sequence alignments

Position-specific issue

• The current score function places the same weight to such non-important section

• It does reduce the amount of noise produced by inaccurate alignment of distant sequences

• However it fails to do so in close ones• Nonetheless, it gives lower weight to

functional part in distant sequences

Page 21: COFFEE: an objective function for multiple sequence alignments

Revision of COFFEE

• Score(Aijl) = 1 when the pair at position l in

sequence i and j occurs with that in library, otherwise it is 0

• W(Aijl) = 1 when the pair at position l in

sequence i and j in the library are identical, otherwise it’s k (0<=k<1)

N

i

N

ijijl

L

l

N

i

N

ijijlijl

L

l

AW

AScoreAW

1

1

)(

)(*)(

Page 22: COFFEE: an objective function for multiple sequence alignments

Features of the revision

• Dispose of the idea as to adopt overall weight

• Instead we check the identity of each pair of residues

• The value of k depends on how we evaluate mismatch

• It could be set according to substitution matrix

Page 23: COFFEE: an objective function for multiple sequence alignments

Alternative alignment

• Although pairwise alignment is optimal, it depends on its constraints, such as penalty

• Different constraints generate alignments of various purpose

• Instead of only one alignment of each possible pair of sequences in the library, we could add its alternative alignment(s) so as to include more information

Page 24: COFFEE: an objective function for multiple sequence alignments

Alternative alignment

• When using library with alternative alignments, we have to apply the revision of COFFEE introduced previously

• Otherwise pairs from different alignments can use only one weight from them

• However, till now scientists used to weigh different alignments of the same constraint

• How to weigh alignments of different constraints is yet a new challenge

Page 25: COFFEE: an objective function for multiple sequence alignments

Conclusion

• COFFEE evaluates the consistency of each pairwise projection with its pairwise alignment

• COFFEE can be used in iterative MSA algorithm at a judging point

• COFFEE is not position-specific enough to filter noise due to inaccurate alignments, which leads to a revision provided by our group

• Alternative pairwise alignments could be added to the library to include more information between sequences

Page 26: COFFEE: an objective function for multiple sequence alignments

Thanks for your attention!

[email protected]

Feb 20th, 2003