Transcript
Page 1: COFFEE: an objective function for multiple sequence alignments

COFFEE: an objective function for multiple sequence alignments

Wang Yi

Computational Genomics Group

Bioinformatics Institute

Page 2: COFFEE: an objective function for multiple sequence alignments

Why MSA

• Multiple Sequence Alignments (MSA) are among the most important tools for analyzing biological sequences

• Useful for:– Structure prediction

– Phylogenetic analysis

– Function prediction

– Polymerase Chain Reaction (PCR) primer design

– And more…

Page 3: COFFEE: an objective function for multiple sequence alignments

What is COFFEE

• Consistency based Objective Function For alignmEnt Evaluation

• The COFFEE score reflects the level of consistency between a MSA and a library containing pairwise alignments of the same group of sequences

Page 4: COFFEE: an objective function for multiple sequence alignments

What is consistency?Why study consistency between MSA and pairwise alignment?

Page 5: COFFEE: an objective function for multiple sequence alignments

Why pairwise alignments

• MSA, unlike pairwise alignment, cannot guarantee optimality yet

• Pairwise alignments use dynamic programming to obtain optimal result

• While it is too expensive for MSA to adopt the same algorithm

• People try to exploit the optimality of pairwise alignment by progressively combine them into MSA

Page 6: COFFEE: an objective function for multiple sequence alignments

Pairwise alignments to MSA

• ClustalW is a widely recognized package among such attempts

• ClustalW generates a guide tree according to the distances between each pair of sequences

• Then it aligns all these sequences progressively, from the closest branches to the most distant ones

Page 7: COFFEE: an objective function for multiple sequence alignments

Problem with ClustalW

• Mistakes made at the beginning of this procedure are never corrected

• This problem stems from not considering the consistency between close pair and distant ones

Page 8: COFFEE: an objective function for multiple sequence alignments

Two solutions

• To solve this problem, we can do either:– Check the consistency between one pairwise

alignment and the rest of the library before the progressive alignment

– Or: after obtaining a MSA, check the consistency between each pair of residues with its counterpart in pairwise alignment library

Page 9: COFFEE: an objective function for multiple sequence alignments

Consistency Vs Consistency

• These two kinds of consistency are actually closely related:

• To increase the consistency between pairs will decrease the chance of inconsistency between a pair with its origin in the library

• T-COFFEE takes the first approach while COFFEE calculates the latter

Page 10: COFFEE: an objective function for multiple sequence alignments

A simple example

• Suppose we have four sequences:– SeqA: THE LAST FAT CAT– SeqB: THE FAST CAT– SeqC: THE VERY FAST CAT– SeqD: THE FAT CAT

• We make a pairwise alignment library of these sequences:

Page 11: COFFEE: an objective function for multiple sequence alignments

Compare the consistency

• SeqA THE LAST FAT CATSeqB THE FAST CAT ---

• SeqA THE LAST FA-T CATSeqC THE VERY FAST CAT

• SeqA THE LAST FAT CATSeqD THE ---- FAT CAT

• SeqB THE ---- FAST CAT SeqC THE VERY FAST CAT

• SeqB THE FAST CAT SeqD THE FA-T CAT

• SeqC THE VERY FAST CATSeqD THE ---- FA-T CAT

• SeqA THE LAST FA-T CATSeqB THE FAST CA-T ---SeqC THE VERY FAST CATSeqD THE ---- FA-T CAT

• Or SeqA THE LAST FA-T CATSeqB THE ---- FAST CAT

SeqC THE VERY FAST CATSeqD THE ---- FA-T CAT

Page 12: COFFEE: an objective function for multiple sequence alignments

How COFFEE works

• Create a library of pairwise alignment for each possible pairs of sequences

• Compare each pair of aligned residues in the MSA to its counterpart in the library

• The overall consistency score is equal to the number of pairs that occur in both MSA and the library, divided by the total number of pairs in MSA.

Page 13: COFFEE: an objective function for multiple sequence alignments

How COFFEE works

• To decrease the amount of noise produced by inaccurate pairwise alignments in the library, we set a weight for each of them

• The weight equals the percent identity between the alignment

• For example: SeqA THE LAST FAT CATSeqB THE FAST CAT ---

• The weight is 8/13*100%=61.5%

Page 14: COFFEE: an objective function for multiple sequence alignments

The idea of weight

• The lower the weight (the more mismatches in the pairwise alignment), the more distant these two sequences are, and the less necessary we need to keep such pair in MSA.

• Therefore, with weight taken into mind we can keep the consistency only when it’s necessary

Page 15: COFFEE: an objective function for multiple sequence alignments

COFFEE Score

• Aij is the pairwise projection of sequences i and j

obtained from a MSA

• Len(Aij) is the length of Aij

• Wij is the weight of pairwise alignment on sequences

i and j in the library

• Score(Aij) is the number of aligned pairs of residues

that are shared between Aij and the library

N

i

N

ijijij

N

i

N

ijijij

ALenW

AScoreW

1

1

)(*

)(*

Page 16: COFFEE: an objective function for multiple sequence alignments

Features of COFFEE

• There is no gap penalty, since they are already contained in the library

• The score is normalized by the value of maximum score, thus it’s between 0 and 1

• The cost of substitution is made position dependent, i.e., we tolerate mismatch that already occurred in the library

Page 17: COFFEE: an objective function for multiple sequence alignments

Comments on COFFEE

Page 18: COFFEE: an objective function for multiple sequence alignments

Position-specific issue

• The current objective function is not position-specific enough

• It applies general weights in the whole pairwise alignments instead of functional parts

• Even very close alignment has non-functional parts, which contain more mismatches

Page 19: COFFEE: an objective function for multiple sequence alignments

Distant and close alignments

• A close alignment example:– THE –FIRST GULF WAR IS FOR JUSTICE||| || |||| ||| || ||| |THE THIRD- GULF WAR IS FOR ---OIL–

• A distant alignment example:– GO ATTACK THIS WEAK BUT EVIL IRAQ-- || |||| DUN TOUCH THE ARMED AND EVIL NKOREA

Page 20: COFFEE: an objective function for multiple sequence alignments

Position-specific issue

• The current score function places the same weight to such non-important section

• It does reduce the amount of noise produced by inaccurate alignment of distant sequences

• However it fails to do so in close ones• Nonetheless, it gives lower weight to

functional part in distant sequences

Page 21: COFFEE: an objective function for multiple sequence alignments

Revision of COFFEE

• Score(Aijl) = 1 when the pair at position l in

sequence i and j occurs with that in library, otherwise it is 0

• W(Aijl) = 1 when the pair at position l in

sequence i and j in the library are identical, otherwise it’s k (0<=k<1)

N

i

N

ijijl

L

l

N

i

N

ijijlijl

L

l

AW

AScoreAW

1

1

)(

)(*)(

Page 22: COFFEE: an objective function for multiple sequence alignments

Features of the revision

• Dispose of the idea as to adopt overall weight

• Instead we check the identity of each pair of residues

• The value of k depends on how we evaluate mismatch

• It could be set according to substitution matrix

Page 23: COFFEE: an objective function for multiple sequence alignments

Alternative alignment

• Although pairwise alignment is optimal, it depends on its constraints, such as penalty

• Different constraints generate alignments of various purpose

• Instead of only one alignment of each possible pair of sequences in the library, we could add its alternative alignment(s) so as to include more information

Page 24: COFFEE: an objective function for multiple sequence alignments

Alternative alignment

• When using library with alternative alignments, we have to apply the revision of COFFEE introduced previously

• Otherwise pairs from different alignments can use only one weight from them

• However, till now scientists used to weigh different alignments of the same constraint

• How to weigh alignments of different constraints is yet a new challenge

Page 25: COFFEE: an objective function for multiple sequence alignments

Conclusion

• COFFEE evaluates the consistency of each pairwise projection with its pairwise alignment

• COFFEE can be used in iterative MSA algorithm at a judging point

• COFFEE is not position-specific enough to filter noise due to inaccurate alignments, which leads to a revision provided by our group

• Alternative pairwise alignments could be added to the library to include more information between sequences

Page 26: COFFEE: an objective function for multiple sequence alignments

Thanks for your attention!

[email protected]

Feb 20th, 2003


Recommended