T-COFFEE, a novel method for Multiple Sequence Alignments Cédric Notredame

Preview:

DESCRIPTION

Why Is It Difficult To Compute A multiple Sequence Alignment? A CROSSROAD PROBLEM BIOLOGY: What is A Good Alignment COMPUTATION What is THE Good Alignment chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. :::.:... :.. *. *: *

Citation preview

T-COFFEE, a novel method for Multiple Sequence

AlignmentsCédric Notredame

chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKDwheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSEtrybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGPmouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: .. . : . . * . *: *

chite AATAKQNYIRALQEYERNGG-wheat ANKLKGEYNKAIAAYNKGESAtrybr AEKDKERYKREM---------mouse AKDDRIRYDNEMKSWEEQMAE * : .* . :

Potential Uses of A Multiple Sequence Alignment?

Extrapolation

Motifs/Patterns

Phylogeny

Profiles

Struc. PredictionMultiple Alignments Are CENTRAL to MOST Bioinformatics Techniques.

Why Is It Difficult To Compute A multiple Sequence Alignment?

A CROSSROAD PROBLEMBIOLOGY:

What is A Good Alignment

COMPUTATIONWhat is THE Good

Alignment

chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKDwheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSEtrybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGPmouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: .. . : . . * . *: *

Why Is It Difficult To Compute A multiple

Sequence Alignment ?

BIOLOGY

CIRCULAR PROBLEM....

GoodSequences

GoodAlignment

COMPUTATION

Dynamic Programming Using A Substitution Matrix

Progressive Alignment

The T-Coffee Algorithm

Progressive Alignment Principle and its Limitations…

The Extended Library Principle…

The Extended Library Principle…

The Triplet Assumption

SEQ A

SEQ B

Weighting And Extension

Extension=Using Information from Other Sequences

Weighting=Using The surrounding Information (Coffee)

T-Coffee Progressive Alignment

Notredame, Higgins, Heringa, 2000

Dynamic Programming Using The extended Library

Local Alignment Global Alignment

Extension

Multiple Sequence Alignment

Mixing Local and Global Alignments

What is a library?

Extension+T-Coffee

Library Based Multiple Sequence Alignment

2Seq1 MySeqSeq2 MyotherSeq#1 21 1 253 8 70….

3Seq1 anotherseqSeq2 atsecondoneSeq3 athirdone#1 21 1 25#1 33 8 70….

How Long Does it Take

Primary Lib: O(N2L2)

Extension:O(N3L2)

Tree :O(N2L2)+O(N3)Aln :O(NL2)

N times slower than

ClustalW

Validating T-Coffee

What Is BaliBaseBaliBase

BaliBase is a collection of reference Multiple Alignments

The Structure of the Sequences are known and were used to assemble the MALN.

Evaluation is carried out by Comparing the Structure Based Reference Alignment With its Sequence Based Counterpart

BaliBase

DALI, Sap …

Method X

Comparison

Validation Using BaliBase

T-Coffee Results

Validation Using BaliBase

Taking T-Coffee Further:

Using Structures

Mixing Heterogenous Information With T-Coffee

Local Alignment Global Alignment

Multiple Sequence Alignment

Multiple Alignment

StructuralSpecialist

Running T-Coffee ONLINE

WHERE ?

Cedric.notredame@europe.com

www.tcoffee.org

The T-Coffee Server

The T-Coffee Server

ES45, 4Proc1 Gb RAM

Future…

Large Scale…

Tailor Made…

WHERE ?

Cedric.notredame@europe.com

www.tcoffee.org

WHO ?

WHO USES T-Coffee ?

Dali Domain DictionnaryPfamSwissProt

WHO Makes T-Coffee ?

Cédric NotredameDes HigginsChantal AbergelOlivier PoirotOrla O’Sullivan

Recommended