33
(PSI-)BLAST & MSA via Max-Planck

(PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt

Embed Size (px)

Citation preview

Page 1: (PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt

(PSI-)BLAST & MSAvia Max-Planck

Page 2: (PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt

• Where? (to find homologues)

• Structural templates- search against the PDB

• Sequence homologues- search against SwissProt or Uniprot (recommended!)

• How many?

• As many as possible, as long as the MSA looks good (next week…)

General Issues

Page 3: (PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt

• How long? (length of homologues)

• Fragments- short homologues (less than 50,60% the query’s length) = bad alignment

• Ensure your sequences exhibit the wanted domain(s)

• N/C terminal tend to vary in length between homologues

• How close? (distance from query sequence)

• All too close- no information

• Too many too far- bad alignment

• Ensure that you have a balanced collection!

General Issues

Page 4: (PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt

• From who? (which species the sequence belongs to)

• Don’t care, all homologues are welcome

• Orthologues/paralogues may be helpful

• Sequences from distant/close species provide different types of information

• Which method? (BLAST/PSI-BLAST)

• Depends on the protein, available homologues, the goal in mind…

General Issues

Page 5: (PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt

Rules For Choosing Sequences• Very similar sequences have little information

• Very different sequences cause trouble…<30% identical with more than half of the other sequences in the set

• Choose sequences as distantly related as possibleSequence between 30-80% identical with more than half of

the sequences in the set

• The more sequences the better

General Issues

Page 6: (PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt

Overall work steps

1.Run the search- 1. Select database2. E-value threshold3. BLAST or PSI-BLAST- how many rounds?

2.Take out sequences- HSP (slider region) or full sequences

3.Align sequences- choose alignment program

4.View alignment with BioEdit tor another program

5.Calculate trees, conservation scores (ConSurf) etc…

Page 7: (PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt

(PSI-)BLAST via Max-Planck

http://toolkit.tuebingen.mpg.de/sections/search

• Databases- swissprot, tremble, NR, env, pdb or any combination for proteins, but only NT for DNA.

• All BLAST programs

Main advantage- you can easily extract and filter the HSPs, on top of full sequences

Page 8: (PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt

The Query Protein

Name: Dihydrodipicolinate reductase

Enzyme reaction:

Molecular process: Lysine biosynthesis (early stages)

Organism: E. coli

Sequence length: 273 aa

Page 9: (PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt

Query:DAPB_ECOLI

>DAPB_ECOLIMHDANIRVAIAGAGGRMGRQLIQAALALEGVQLGAALEREGSSLLGSDAGELAGAGKTGVTVQSSLDAVKDDFDVFIDFTRPEGTLNHLAFCRQHGKGMVIGTTGFDEAGKQAIRDAAADIAIVFAANFSVGVNVMLKLLEKAAKVMGDYTDIEIIEAHHRHKVDAPSGTALAMGEAIAHALDKDLKDCAVYSREGHTGERVPGTIGFATVRAGDIVGEHTAMFADIGERLEITHKASSRMTFANGAVRSALWLSGKESGLFDMRDVLDLNNL

The Query Protein

Page 10: (PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt

(PSI-)BLAST via Max-Planckhttp://toolkit.tuebingen.mpg.de/psi_blast/

Choose database or databases

(selecting a few using CTRL)

Upload sequenceor MSA

Page 11: (PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt

(PSI-)BLAST via Max-Planc

Page 12: (PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt

(PSI-)BLAST via Max-Planc

Page 13: (PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt

(PSI-)BLAST via Max-Planc

Page 14: (PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt

(PSI-)BLAST via Max-Planc

Page 15: (PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt

(PSI-)BLAST via Max-Planck

E-value threshold can be assessed using the distribution

Page 16: (PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt

Forward results to MSA

http://toolkit.tuebingen.mpg.de/sections/alignment

Page 17: (PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt

Forward results to MSA

Forward results to

MSAAll marked hits or filter by e-value

HSP (sider region) or full sequences

Page 18: (PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt

Forward results to MSA

Page 19: (PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt

Align via Max-Planck

Alignment results:

Save the alignment

Page 20: (PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt

Alignmen viewing & editingBioEdit

• http://www.mbio.ncsu.edu/BioEdit/BioEdit.html

• Easy-to-use sequence alignment editor

• View and manipulate alignments up to 20,000 sequences. •Four modes of manual alignment: select and slide, dynamic grab and drag, gap insert and delete by mouse click, and on-screen typing which behaves like a text editor.

•Reads and writes Genbank, Fasta, Phylip 3.2, Phylip 4, and NBRF/PIR formats.  Also reads GCG and Clustal formats

Page 21: (PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt

Easiest Using Bioedit

http://www.mbio.ncsu.edu/BioEdit/bioedit.html

Alignment viewing & editing

Page 22: (PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt

Easiest Using Bioedit

http://www.mbio.ncsu.edu/BioEdit/bioedit.html

• Find a specific sequence: “Edit-> search -> in titles”

• Erase\add sequences: “Edit-> cut\paste\delete sequence”

• “Sequence Identity matrix” under “Alignment”- useful for a rough evaluation of distances within the alignment.

• After taking out sequences, “Minimize Alignment” under “Alignment” takes out unessential gaps.

• Can save an image using: “File -> Graphic View” & then “Edit -> Copy page as BITMAP”

Alignment viewing & editing

Page 23: (PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt

A little of ConSurf

Compute Conservation Scores

• Give an MSA or will compute one for you

(given a FASTA sequence, BLAST & MSA)

Main advantage:filters short HSPs, removes redundant

sequences

• Shows conservation scores on sequence or on a protein structure (if available)

Page 24: (PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt

ConSurf

http://consurf.tau.ac.il/

Page 25: (PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt

ConSurf

Page 26: (PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt

ConSurfhttp://consurf.tau.ac.il/results/1321532763/output.php

Page 27: (PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt

ConSurfhttp://consurf.tau.ac.il/results/1321532763/output.php

Page 28: (PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt

ConSurf

MSA colored by conservation

PSI-BLAST result

MSA

Phylogenetic tree

Sequences used

Sequence conservation

Page 29: (PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt

ConSurf

Page 30: (PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt

Jmol- Easy web-based viewer

Page 31: (PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt

WebLogohttp://weblogo.berkeley.edu/logo.cgi

Page 32: (PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt

WebLogohttp://weblogo.berkeley.edu/logo.cgi

Page 33: (PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt

Each sequence is a different story

adjust parameters:

• BLAST- E-value, substitution matrix, gap penalties, database, minimum length, redundancy level, fragment overlap…

• PSI-BLAST- BLAST parameters + PSSM inclusion threshold (or chose manually), number of rounds…

• Try using HSP or full sequences, different MSA programs…

No “Miracle solution”