45
Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU

Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles

Embed Size (px)

Citation preview

Page 1: Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles

Psi-Blast

Morten Nielsen,CBS, BioCentrum,

DTU

Page 2: Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles

Objectives

• Understand why BLAST often fails for low sequence similarity

• See the beauty of sequence profiles– Position specific scoring matrices (PSSMs)

• Use BLAST to generate Sequence profiles

• Use profiles to identify amino acids essential for protein function and structure

Page 3: Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles

What goes wrong when Blast fails?

• Conventional sequence alignment uses a (Blosum) scoring matrix to identify amino acids matches in the two protein sequences

Page 4: Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles

Alignment scoring matrices

• Blosum62 score matrix. Fg=1. Ng=0?

L A G D S D

F

I

G

D

S

L

Page 5: Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles

Alignment scoring matrices• Blosum62 score matrix. Fg=1. Ng=0?

• Score =2-1+6+6+4=17

L A G D S D

F 0 -2 -3 -3 -2 -3

I 2 -1 -4 -3 -2 -3

G -4 0 6 -1 0 -1

D -4 -2 -1 6 0 6

S -2 1 0 0 4 0

L 4 -1 -4 -4 -2 -4

LAGDSI-GDS

Page 6: Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles

What goes wrong when Blast fails?

• Conventional sequence alignment uses a (Blosum) scoring matrix to identify amino acids matches in the two protein sequences• This scoring matrix is identical at all positions in the protein sequence!

EVVFIGDSLVQLMHQC

X X X

X X X

AGDS.GGGDS

Page 7: Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles

When Blast works!

1PLC

._

1PLB._

Page 8: Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles

When Blast fails!

1PLC

._

1PMY._

Page 9: Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles

When Blast fails

Page 10: Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles

Sequence profiles

• In reality not all positions in a protein are equally likely to mutate• Some amino acids (active cites) are highly

conserved, and the score for mismatch must be very high

• Other amino acids can mutate almost for free, and the score for mismatch should be lower than the BLOSUM score

• Sequence profiles can capture these differences

Page 11: Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles

What are sequence profiles?

Page 12: Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles

Anchor positions

Binding Motif. MHC class I with peptide

Page 13: Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles

SLLPAIVEL YLLPAIVHI TLWVDPYEV GLVPFLVSV KLLEPVLLL LLDVPTAAV LLDVPTAAV LLDVPTAAVLLDVPTAAV VLFRGGPRG MVDGTLLLL YMNGTMSQV MLLSVPLLL SLLGLLVEV ALLPPINIL TLIKIQHTLHLIDYLVTS ILAPPVVKL ALFPQLVIL GILGFVFTL STNRQSGRQ GLDVLTAKV RILGAVAKV QVCERIPTIILFGHENRV ILMEHIHKL ILDQKINEV SLAGGIIGV LLIENVASL FLLWATAEA SLPDFGISY KKREEAPSLLERPGGNEI ALSNLEVKL ALNELLQHV DLERKVESL FLGENISNF ALSDHHIYL GLSEFTEYL STAPPAHGVPLDGEYFTL GVLVGVALI RTLDKVLEV HLSTAFARV RLDSYVRSL YMNGTMSQV GILGFVFTL ILKEPVHGVILGFVFTLT LLFGYPVYV GLSPTVWLS WLSLLVPFV FLPSDFFPS CLGGLLTMV FIAGNSAYE KLGEFYNQMKLVALGINA DLMGYIPLV RLVTLKDIV MLLAVLYCL AAGIGILTV YLEPGPVTA LLDGTATLR ITDQVPFSVKTWGQYWQV TITDQVPFS AFHHVAREL YLNKIQNSL MMRKLAILS AIMDKNIIL IMDKNIILK SMVGNWAKVSLLAPGAKQ KIFGSLAFL ELVSEFSRM KLTPLCVTL VLYRYGSFS YIGEVLVSV CINGVCWTV VMNILLQYVILTVILGVL KVLEYVIKV FLWGPRALV GLSRYVARL FLLTRILTI HLGNVKYLV GIAGGLALL GLQDCTMLVTGAPVTYST VIYQYMDDL VLPDVFIRC VLPDVFIRC AVGIGIAVV LVVLGLLAV ALGLGLLPV GIGIGVLAAGAGIGVAVL IAGIGILAI LIVIGILIL LAGIGLIAA VDGIGILTI GAGIGVLTA AAGIGIIQI QAGIGILLAKARDPHSGH KACDPHSGH ACDPHSGHF SLYNTVATL RGPGRAFVT NLVPMVATV GLHCYEQLV PLKQHFQIVAVFDRKSDA LLDFVRFMG VLVKSPNHV GLAPPQHLI LLGRNSFEV PLTFGWCYK VLEWRFDSR TLNAWVKVVGLCTLVAML FIDSYICQV IISAVVGIL VMAGVGSPY LLWTLVVLL SVRDRLARL LLMDCSGSI CLTSTVQLVVLHDDLLEA LMWITQCFL SLLMWITQC QLSLLMWIT LLGATCMFV RLTRFLSRV YMDGTMSQV FLTPKKLQCISNDVCAQV VKTDGNPPE SVYDFFVWL FLYGALLLA VLFSSDFRI LMWAKIGPV SLLLELEEV SLSRFSWGAYTAFTIPSI RLMKQDFSV RLPRIFCSC FLWGPRAYA RLLQETELV SLFEGIDFY SLDQSVVEL RLNMFTPYINMFTPYIGV LMIIPLINV TLFIGSHVV SLVIVTTFV VLQWASLAV ILAKFLHWL STAPPHVNV LLLLTVLTVVVLGVVFGI ILHNGAYSL MIMVKCWMI MLGTHTMEV MLGTHTMEV SLADTNSLA LLWAARPRL GVALQTMKQGLYDGMEHL KMVELVHFL YLQLVFGIE MLMAQEALA LMAQEALAF VYDGREHTV YLSGANLNL RMFPNAPYLEAAGIGILT TLDSQVMSL STPPPGTRV KVAELVHFL IMIGVLVGV ALCRWGLLL LLFAGVQCQ VLLCESTAVYLSTAFARV YLLEMLWRL SLDDYNHLV RTLDKVLEV GLPVEYLQV KLIANNTRV FIYAGSLSA KLVANNTRLFLDEFMEGV ALQPGTALL VLDGLDVLL SLYSFPEPE ALYVDSLFF SLLQHLIGL ELTLGEFLK MINAYLDKLAAGIGILTV FLPSDFFPS SVRDRLARL SLREWLLRI LLSAWILTA AAGIGILTV AVPDEIPPL FAYDGKDYIAAGIGILTV FLPSDFFPS AAGIGILTV FLPSDFFPS AAGIGILTV FLWGPRALV ETVSEQSNV ITLWQRPLV

Sequence information

Page 14: Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles

Sequence Information

Say that a peptide must have L at P2 in order to bind, and that A,F,W,and Y are found at P1. Which position has most information? How many questions do I need to ask to tell if a peptide binds looking at only P1 or P2?

Page 15: Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles

Sequence Information

Say that a peptide must have L at P2 in order to bind, and that A,F,W,and Y are found at P1. Which position has most information? How many questions do I need to ask to tell if a peptide binds looking at only P1 or P2?

P1: 4 questions (at most) P2: 1 question (L or not) P2 has the most information

Page 16: Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles

Sequence Information

Calculate pa at each positionEntropy

Information content

Conserved positions– PV=1, P!v=0 => S=0,

I=log(20)Mutable positions

– Paa=1/20 => S=log(20), I=0

Say that a peptide must have L at P2 in order to bind, and that A,F,W,and Y are found at P1. Which position has most information? How many questions do I need to ask to tell if a peptide binds looking at only P1 or P2?

P1: 4 questions (at most) P2: 1 question (L or not) P2 has the most information

Page 17: Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles

Sequence information - I

I = log(20) + paa

∑ log(pa )ALAKAAAAMALAKAAAANALAKAAAARALAKAAAATALAKAAAAVGMNERPILTGILGFVFTMTLNAWVKVVKLNEPVLLLAVVPFIVSV

PA = 6/10 = 0.6

PG = 2/10 = 0.2

PT = PK = 1/10 = 0.1

PC = PD = …PV = 0.0

Multiple Sequence alignment

Page 18: Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles

Information content

A R N D C Q E G H I L K M F P S T W Y V S I1 0.10 0.06 0.01 0.02 0.01 0.02 0.02 0.09 0.01 0.07 0.11 0.06 0.04 0.08 0.01 0.11 0.03 0.01 0.05 0.08 3.96 0.372 0.07 0.00 0.00 0.01 0.01 0.00 0.01 0.01 0.00 0.08 0.59 0.01 0.07 0.01 0.00 0.01 0.06 0.00 0.01 0.08 2.16 2.163 0.08 0.03 0.05 0.10 0.02 0.02 0.01 0.12 0.02 0.03 0.12 0.01 0.03 0.05 0.06 0.06 0.04 0.04 0.04 0.07 4.06 0.264 0.07 0.04 0.02 0.11 0.01 0.04 0.08 0.15 0.01 0.10 0.04 0.03 0.01 0.02 0.09 0.07 0.04 0.02 0.00 0.05 3.87 0.455 0.04 0.04 0.04 0.04 0.01 0.04 0.05 0.16 0.04 0.02 0.08 0.04 0.01 0.06 0.10 0.02 0.06 0.02 0.05 0.09 4.04 0.286 0.04 0.03 0.03 0.01 0.02 0.03 0.03 0.04 0.02 0.14 0.13 0.02 0.03 0.07 0.03 0.05 0.08 0.01 0.03 0.15 3.92 0.407 0.14 0.01 0.03 0.03 0.02 0.03 0.04 0.03 0.05 0.07 0.15 0.01 0.03 0.07 0.06 0.07 0.04 0.03 0.02 0.08 3.98 0.348 0.05 0.09 0.04 0.01 0.01 0.05 0.07 0.05 0.02 0.04 0.14 0.04 0.02 0.05 0.05 0.08 0.10 0.01 0.04 0.03 4.04 0.289 0.07 0.01 0.00 0.00 0.02 0.02 0.02 0.01 0.01 0.08 0.26 0.01 0.01 0.02 0.00 0.04 0.02 0.00 0.01 0.38 2.78 1.55

Page 19: Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles

Sequence logos

Height of a column equal to I

Relative height of a letter is pHighly useful tool to visualize sequence motifs

High information positions

HLA-A0201

Page 20: Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles

Sequence logos

• Height of a column equal to I• Relative height of a letter is p• Letters upside-down if pa < qa

High information positions

I = log(20) + paa

∑ log(pa )

Page 21: Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles

Protein world

Protein fold

Protein structure classification

Protein superfamily

Protein family

Page 22: Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles

ADDGSLAFVPSEF--SISPGEKIVFKNNAGFPHNIVFDEDSIPSGVDASKISMSEEDLLN TVNGAI--PGPLIAERLKEGQNVRVTNTLDEDTSIHWHGLLVPFGMDGVPGVSFPG---I-TSMAPAFGVQEFYRTVKQGDEVTVTIT-----NIDQIED-VSHGFVVVNHGVSME---IIE--KMKYLTPEVFYTIKAGETVYWVNGEVMPHNVAFKKGIV--GEDAFRGEMMTKD----TSVAPSFSQPSF-LTVKEGDEVTVIVTNLDE------IDDLTHGFTMGNHGVAME---VASAETMVFEPDFLVLEIGPGDRVRFVPTHK-SHNAATIDGMVPEGVEGFKSRINDE----TVNGQ--FPGPRLAGVAREGDQVLVKVVNHVAENITIHWHGVQLGTGWADGPAYVTQCPI

Sequence profiles

Conserved

Non-conserved

Matching any thing but G => large negative score

Any thing can match

TKAVVLTFNTSVEICLVMQGTSIV----AAESHPLHLHGFNFPSNFNLVDPMERNTAGVP

TVNGQ--FPGPRLAGVAREGDQVLVKVVNHVAENITIHWHGVQLGTGWADGPAYVTQCPI

Page 23: Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles

How to make sequence profiles

1. Align (BLAST) sequence against large sequence database (Swiss-Prot)

2. Select significant alignments and make sequence profile

3. Use profile to align against sequence database to find new significant hits

4. Repeat 2 and 3 (normally 3 times!)

Page 24: Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles

Sequence profiles (1J2J.B)

>1J2J.B mol:aa PROTEIN TRANSPORT NVIFEDEEKSKMLARLLKSSHPEDLRAANKLIKEMVQEDQKRMEK

Page 25: Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles

Sequence profiles (1J2J.B)

>1J2J.B mol:aa PROTEIN TRANSPORT NVIFEDEEKSKMLARLLKSSHPEDLRAANKLIKEMVQEDQKRMEK

A R N D C Q E G H I L K M F P S T W Y V 1 N -2 0 6 1 -3 0 0 0 1 -3 -3 0 -2 -3 -2 1 0 -4 -2 -3 2 V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 3 I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 2 -3 1 0 -3 -2 -1 -3 -1 3 4 F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 -4 -2 -2 1 3 -1 5 E -1 0 0 2 -4 2 5 -2 0 -3 -3 1 -2 -3 -1 0 -1 -3 -2 -2 6 D -2 -2 1 6 -3 0 2 -1 -1 -3 -4 -1 -3 -3 -1 0 -1 -4 -3 -3 7 E -1 0 0 2 -4 2 5 -2 0 -3 -3 1 -2 -3 -1 0 -1 -3 -2 -2 8 E -1 0 0 2 -4 2 5 -2 0 -3 -3 1 -2 -3 -1 0 -1 -3 -2 -2 9 K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 -1 -3 -1 0 -1 -3 -2 -210 S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 1 -3 -2 -211 K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 -1 -3 -1 0 -1 -3 -2 -212 M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 0 -2 -1 -1 -1 -1 1

Page 26: Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles

Sequence profiles (1J2J.B)

Blosum62 Sequence Profile

>1J2J.B mol:aa PROTEIN TRANSPORT NVIFEDEEKSKMLARLLKSSHPEDLRAANKLIKEMVQEDQKRMEK

Page 27: Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles

Example.

>1K7C.A TTVYLAGDSTMAKNGGGSGTNGWGEYLASYLSATVVNDAVAGRSARSYTREGRFENIADVVTAGDYVIVEFGHNDGGSLSTDNGRTDCSGTGAEVCYSVYDGVNETILTFPAYLENAAKLFTAKGAKVILSSQTPNNPWETGTFVNSPTRFVEYAELAAEVAGVEYVDHWSYVDSIYETLGNATVNSYFPIDHTHTSPAGAEVVAEAFLKAVVCTGTSLKSVLTTTSFEGTCL

• What is the function• Where is the active site?

Page 28: Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles

What would you do?

• Function• Run Blast against PDB• No significant hits

• Run Blast against NR (Sequence database)• Function is Acetylesterase?

• Where is the active site?

Page 29: Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles

Example. Where is the active site?

1WAB Acetylhydrolase

1G66 Acetylxylan esterase

1USW Hydrolase

Page 30: Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles

When Blast fails!

1K

7A

.A

1WAB._

Page 31: Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles

Example. (SGNH active site)

Page 32: Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles

Example. Where is the active site?

• Sequence profiles might show you where to look!• The active site could be around

• S9, G42, N74, and H195

Page 33: Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles

Profile-profile scoring matrix

1K

7C

.A

1WAB._

Page 34: Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles

Example. Where is the active site?

Align using sequence profiles

ALN 1K7C.A 1WAB._ RMSD = 5.29522. 14% ID1K7C.A TVYLAGDSTMAKNGGGSGTNGWGEYLASYLSATVVNDAVAGRSARSYTREGRFENIADVVTAGDYVIVEFGHNDGGSLSTDN S G N1WAB._ EVVFIGDSLVQLMHQCE---IWRELFS---PLHALNFGIGGDSTQHVLW--RLENGELEHIRPKIVVVWVGTNNHG------

1K7C.A GRTDCSGTGAEVCYSVYDGVNETILTFPAYLENAAKLFTAK--GAKVILSSQTPNNPWETGTFVNSPTRFVEYAEL-AAEVA1WAB._ ---------------------HTAEQVTGGIKAIVQLVNERQPQARVVVLGLLPRGQ-HPNPLREKNRRVNELVRAALAGHP

1K7C.A GVEYVDHWSYVDSIYETLGNATVNSYFPIDHTHTSPAGAEVVAEAFLKAVVCTGTSL H1WAB._ RAHFLDADPG---FVHSDG--TISHHDMYDYLHLSRLGYTPVCRALHSLLLRL---L

Page 35: Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles

Where is the active site?

Rhamnogalacturonan acetylesterase (1k7c)

Page 36: Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles

How to do it?

Example

>QUERY1

MKDTDLSTLLSIIRLTELKESKRNALLSLIFQLSVAYFIALVIVSRFVRYVNYITYNNLV

EFIIVLSLIMLIIVTDIFIKKYISKFSNILLETLNLKINSDNNFRREIINASKNHNDKNK

LYDLINKTFEKDNIEIKQLGLFIISSVINNFAYIILLSIGFILLNEVYSNLFSSRYTTIS

IFTLIVSYMLFIRNKIISSEEEEQIEYEKVATSYISSLINRILNTKFTENTTTIGQDKQL

YDSFKTPKIQYGAKVPVKLEEIKEVAKNIEHIPSKAYFVLLAESGLRPGELLNVSIENID

LKARIIWINKETQTKRAYFSFFSRKTAEFLEKVYLPAREEFIRANEKNIAKLAAANENQE

IDLEKWKAKLFPYKDDVLRRKIYEAMDRALGKRFELYALRRHFATYMQLKKVPPLAINIL

QGRVGPNEFRILKENYTVFTIEDLRKLYDEAGLVVLE

Page 37: Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles

Using Iterative Blast

Page 38: Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles

Using Iterative Blast

Page 39: Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles

Using Iterative Blast

Page 40: Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles

Using Iterative Blast

Page 41: Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles

Using Iterative Blast (1st iteration)

Page 42: Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles

Using Iterative Blast (3rd iteration)

Page 43: Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles

HHpred webserver

Page 44: Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles

Take home message

• Blast will often fail to recognize sequence relationships for low homology sequence pairs

• Sequence profiles contain information on conserved/variable residues in a protein sequence

• Sequence profiles are calculated from (multiple) sequence alignments

• Iterative Blast enables homology recognition also for low sequence similarity

• Sequence profiles give information on residues essential for protein function and protein structure

Page 45: Psi-Blast Morten Nielsen, CBS, BioCentrum, DTU. Objectives Understand why BLAST often fails for low sequence similarity See the beauty of sequence profiles