29
Improving FFAS alignments using T-Coffee

Improving ffas alignments using t cofee

Embed Size (px)

Citation preview

Page 1: Improving ffas alignments using t cofee

Improving FFAS alignments using T-Coffee

Page 2: Improving ffas alignments using t cofee

Why we improve alignments?

• Protein function determination is nontrivial task. The best way to do it is to relate sequence of unknown protein to proteins with known properties.

• To explore evolution of related sequences• Because determination of protein structure

experimentally is still time consuming alignments are also used to create homology models which can give us additional functional information.

Page 3: Improving ffas alignments using t cofee

What is sequence profile?

A protein profile is a matrix that describes a particular domain or family. Each row of the matrix represents a position in a multiple alignment. Each row has 20 scores, one for each amino acid, reflecting the probabilities of the various amino acids occurring at that row's position in the profile. Thus, scores are position dependent.

Page 4: Improving ffas alignments using t cofee

Why we use profiles?

To better understanding of evolutionary, structural functional relationships between related sequences.

For biological analysis we usually prepare following steps:

• Finding protein sequences related to our query using database searches algorithm. BLAST, FASTA with reasonable confidence. (FFAS stops at this stage)

• Creating multiple sequence alignment of related sequences (Clustal W, T-Cofee, POA, Dialign)

• Using additional information e.g., predicted secondary structure (Orfeus), knowledge (biological importance of given amino acids)

Page 5: Improving ffas alignments using t cofee

How T-Coffee works?

It performs all possible pairwise alignments within the set of sequences but in two steps: first with ClustalW and second using „lalign” program from local-Fasta package.The results from both methods are combined into primary library. A library extension step determines how residue pair align with respect to other residues. Then library is used to assess how well sequences are aligned given the other sequences in the dataset, rather then looking at two sequences in isolation. The final alignment is then built progressively using the information in the library.

Page 6: Improving ffas alignments using t cofee

What was done?

Four algorithms were used in attempt to obtain better alignments:

• Simple elongation of sequences in blast profile.

• Aligning sequences to blast profile using T-Coffee

• Creating profiles using T-coffee in multiple sequence alignment mode

• Mixed method – T-coffee + elongation

Page 7: Improving ffas alignments using t cofee

Benchmark

• 1024 pairs of protein domains from SCOP

• Low seqence identity

• High strctural similarity

• No redundant pairs

• Only one domains structures

Page 8: Improving ffas alignments using t cofee

Benchmark

• In each method profiles were created for target and template

• Altough for some algorithms benchmark was computional power expensive (e.g. 14 days on 15 cpu’s for T-coffee in multiple sequence alignment mode) only oryginal PSI-BLAST profile creating procedure was tested.

• In T-coffee in multiple sequence alignment mode there was no results for some pairs.

Page 9: Improving ffas alignments using t cofee

Alignment qualiy measure

It is common that only fraction of the model is correct. After structural superposition the most significant subset is found using LG score measure.

This allow to compare only reasonable parts of models.

Page 10: Improving ffas alignments using t cofee

T-cofee to profile algorithm

LG score FFAS vs LG score T-Cofee

0

2000

4000

6000

8000

10000

12000

14000

0 5000 10000 15000

LG score FFAS

LG

sc

ore

T-C

ofe

e

Page 11: Improving ffas alignments using t cofee

T-cofee to profile algorithm

LG score FFAS vs LG score T-Cofee

0

200

400

600

800

1000

1200

1400

1600

1800

2000

0 500 1000 1500 2000

LG score FFAS

LG

sco

re T

-Co

fee

Page 12: Improving ffas alignments using t cofee

Elongation

LG score FFAS vs LG score T-Cofee

0

2000

4000

6000

8000

10000

12000

14000

0 2000 4000 6000 8000 10000 12000 14000

LG score FFAS

LG

sc

ore

T-C

ofe

e

Page 13: Improving ffas alignments using t cofee

T-Coffee

LG score FFAS vs LG score T-coffee

0200

400600

8001000

12001400

16001800

2000

0 500 1000 1500 2000

LG score FFAS

LG

sco

re T

-co

ffee

Page 14: Improving ffas alignments using t cofee

T-coffee + elongation

LG score FFAS vs LG score T-coffee

0

2000

4000

6000

8000

10000

12000

0 2000 4000 6000 8000 10000 12000

LG score FFAS

LG

sco

re T

-co

ffee

Page 15: Improving ffas alignments using t cofee

• Best results are obtained using T-coffe only. For FFAS LG score <600 alignments are improved in 72% of all cases.

• Altough LG score is unknow in „real life” there is necesery to find correlation between alignment improvement and known factors.

Note:

• Some of the results are missing. • We can not trust benchmark in all cases

Page 16: Improving ffas alignments using t cofee

d1ca1_1 a.124.1.1 d1ah7__ a.124.1.1

• Sequence identity 33%

• LG score FFAS = 2457.6

• LG score T-coffee = 2545.6

Page 17: Improving ffas alignments using t cofee

FFAS ALIGNMENT: 10 20 30 40 50 60 ....|....|....|....|....|....|....|....|....|....|....|....|d1ah7__ 1 EDKHKEGVNSHLWIVNRAIDIMSRNTTL----VKQDRVAQLNEWRTELENGIYAADYENP 56 model 1 WDGKIDGTGTHAMIVTQGVSILENDLSKNEPESVRKNLEILKENMHELQLGSTYPDYDKN 60 70 80 90 100 110 120 ....|....|....|....|....|....|....|....|....|....|....|....|d1ah7__ 57 YYDNSTFASHFYDPDNGKTYI---------PFAKQAKETGAKYFKLAGESYKNKDMKQAF 107 model 61 AYD--LYQDHFWDPDTDNNFSKDNSWYLAYSIPDTGESQIRKFSALARYEWQRGNYKQAT 118 130 140 150 160 170 180 ....|....|....|....|....|....|....|....|....|....|....|....|d1ah7__ 108 FYLGLSLHYLGDVNQPMHAANFTNLSYPQGFHSKYENFVDTIKDNYKVTDGNGYWNWKGT 167 model 119 FYLGEAMHYFGDIDTPYHPANVTAVD--SAGHVKFETFAEERKEQYKI-------NTVGC 169 190 200 210 220 230 240 ....|....|....|....|....|....|....|....|....|....|....|....|d1ah7__ 168 NPEEWIHGAAVVAKQDYSGIVNDN--------TKDWFVKAAVSQEYAD-KWRAEVTPMTG 218 model 170 KTNEDFYAD-ILKNKDFNAWSKEYARGFAKTGKSIYYSHASMSHSWDDW------DYAAK 222 250 260 ....|....|....|....|...d1ah7__ 219 KRLMDAQRVTAGYIQLWFDTYGD 241 model 223 VTLANSQKGTAGYIYRFLHDVSE 245

d1ca1_1d1ah7__

Page 18: Improving ffas alignments using t cofee

TCOFEE ALIGNMENT: 10 20 30 40 50 60 ....|....|....|....|....|....|....|....|....|....|....|....|d1ah7__ 1 EDKHKEGVNSHLWIVNRAIDIMSRNTT----LVKQDRVAQLNEWRTELENGIYAADYENP 56 model 1 WDGKIDGTGTHAMIVTQGVSILENDLSKNEPESVRKNLEILKENMHELQLGSTYPDYDKN 60 70 80 90 100 110 120 ....|....|....|....|....|....|....|....|....|....|....|....|d1ah7__ 57 YYDNSTFASHFYDPDNGKTY---------IPFAKQAKETGAKYFKLAGESYKNKDMKQAF 107 model 61 AY--DLYQDHFWDPDTDNNFSKDNSWYLAYSIPDTGESQIRKFSALARYEWQRGNYKQAT 118 130 140 150 160 170 180 ....|....|....|....|....|....|....|....|....|....|....|....|d1ah7__ 108 FYLGLSLHYLGDVNQPMHAANFTNLSYPQGFHSKYENFVDTIKDNYKVTDGNGYWNWKGT 167 model 119 FYLGEAMHYFGDIDTPYHPANVTAVDSAG--HVKFETFAEERKEQYKINTVGCK-----T 171 190 200 210 220 230 240 ....|....|....|....|....|....|....|....|....|....|....|....|d1ah7__ 168 NPEEWIHGAAVVAKQDYSGIVNDNTKDWFVKAAVSQEYADKWRAEVTPMTGKRLMDAQRV 227 model 172 NEDFYADILKNKDFNAWSKEYARGFAKTGKSIYYSHASMSHSWDDWDYAAKVTLANSQKG 231 250 ....|....|....|.d1ah7__ 228 TAGYI-QLWFDTYGDR 242 model 232 TAGYIYRFLHDVSEGN 247

d1ca1_1d1ah7__

Page 19: Improving ffas alignments using t cofee

STRUCTURAL ALIGNMENT: 10 20 30 40 50 60 ....|....|....|....|....|....|....|....|....|....|....|....|d1ah7__ 1 WSAEDKHKEGVNSHLWIVNRAIDIMSRNTTLVK----QDRVAQLNEWRTELENGIYAADY 56 model 1 WDGKIDG---TGTHAMIVTQGVSILENDLSKNEPESVRKNLEILKENMHELQLGSTYPDY 57 70 80 90 100 110 120 ....|....|....|....|....|....|....|....|....|....|....|....|d1ah7__ 57 ENPYYDNSTFASHFYDPDNGKTYIP---------FAKQAKETGAKYFKLAGESYKNKDMK 107 model 58 DK-NAYD-LYQDHFWDPDTDNNFSKDNSWYLAYSIPDTGESQIRKFSALARYEWQRGNYK 115 130 140 150 160 170 180 ....|....|....|....|....|....|....|....|....|....|....|....|d1ah7__ 108 QAFFYLGLSLHYLGDVNQPMHAANFTNLSYPQGFHSKYENFVDTIKDNYKVTDGNGYWNW 167 model 116 QATFYLGEAMHYFGDIDTPYHPANVTAVDS--AGHVKFETFAEERKEQYKINTVGCKTNE 173 190 200 210 220 230 240 ....|....|....|....|....|....|....|....|....|....|....|....|d1ah7__ 167 -----------KGTNPEEWIHGAAVVAKQDYSG-IVNDNTKDWFVKAAVSQEYADKWRAE 215 model 174 DFYADILKNKDFNAWSKEYARGFAKTGKSIYYSHASMSH-----------------SWDD 216 250 260 270 ....|....|....|....|....|....|...d1ah7__ 216 VTPMTGKRLMDAQRVTAGYIQLWFDTYGDR--- 245 model 217 WDYAAKVTLANSQKGTAGYIYRFLHDVSEGNDP 249

d1ca1_1d1ah7__

Page 20: Improving ffas alignments using t cofee

d1ca1_1d1ah7__d1ca1_1d1ah7__

Page 21: Improving ffas alignments using t cofee

d1mgta2 c.55.7.1 d1sfe_2 c.55.7.1

• Sequence identity 40%

• LG score FFAS = 207.7

• LG score T-coffee = 251.1

Page 22: Improving ffas alignments using t cofee

d1fb1a_d1a8ra_

Page 23: Improving ffas alignments using t cofee

d1mgta2 c.55.7.1 d1sfe_2 c.55.7.1

• Sequence identity 39%

• LG score FFAS = 47.6

• LG score T-coffee = 113.8

Page 24: Improving ffas alignments using t cofee

d1mgta2d1sfe_2

Page 25: Improving ffas alignments using t cofee

d1mgta2d1sfe_2

Page 26: Improving ffas alignments using t cofee

Sequence Identity

-6000

-4000

-2000

0

2000

4000

0 20 40 60 80 100 120

Identity %

LG

sco

re

Page 27: Improving ffas alignments using t cofee

Profile identity vs LG score

-6000

-5000

-4000

-3000

-2000

-1000

0

1000

2000

3000

4000

0 20 40 60 80 100

Identity %

LG

- s

co

re

Page 28: Improving ffas alignments using t cofee

FFAS score vs LG score

-1000

-800

-600

-400

-200

0

200

400

600

800

1000

-2.00E+02 -1.80E+02 -1.60E+02 -1.40E+02 -1.20E+02 -1.00E+02 -8.00E+01 -6.00E+01 -4.00E+01 -2.00E+01 0.00E+00

FFAS score

LG

sco

re

Page 29: Improving ffas alignments using t cofee

Conclusions:

• FFAS alignments still can be improved.• Using T-coffee to create FFAS profiles can improve

alignment quality• It is not known how to add logic wether use T-coffee to

create FFAS profiles

To do:

• Check correlation between alignment diversity and alignment improvement.

• Try to use different method of comparison of sequence alignment (overlap score)

• Compare other multiple alignment method to T-coffee.