Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs...

Workshop OUTLINEPart 1:

• Introduction and motivation

• How does BLAST work?

Part 2:

• BLAST programs

• Sequence databases

• Work Steps

• Extract and analyze results

BLAST programs

• All types of searches are possibleQuery: DNA Protein

Database: DNA Protein

blastn – nuc vs. nucblastp – prot vs. protblastx – translated query vs. protein databasetblastn – protein vs. translated nuc. DBtblastx – translated query vs. translated database

Amino acid sequence – most suitable for homology search

• The database and the query can be either nucleotides or amino acids!

• We prefer amino acid sequence:-amino acid sequence is more conserved-20 letter alphabet. Two random hits share 5% identity in average (comparing to 25% in DNA seq).-protein comparison matrices are more sensitive .- protein databases are smaller – less random hits.- we want to conclude about the structure- proteins are much more relevant.

BLAST programs

• Where? (to find homologues)

• Structural templates- search against the PDB

• Sequence homologues- search against SwissProt or Uniprot (recommended!)

• How many?

• As many as possible, as long as the MSA looks good (next week…)

General Issues

• How long? (length of homologues)

• Fragments- short homologues (less than 50,60% the query’s length) = bad alignment

• Ensure your sequences exhibit the wanted domain(s)

• N/C terminal tend to vary in length between homologues

• How close? (distance from query sequence)

• All too close- no information

• Too many too far- bad alignment

• Ensure that you have a balanced collection!

General Issues

• From who? (which species the sequence belongs to)

• Don’t care, all homologues are welcome

• Orthologues/paralogues may be helpful

• Sequences from distant/close species provide different types of information

• Which method? (BLAST/PSI-BLAST)

• Depends on the protein, available homologues, the goal in mind…

General Issues

Sequence databases

Where do we want to search?DNA sequences

• ESTs- no annotated coding sequence pool. the largest pool of sequence data for many organisms (NCBI)

• NR- All GenBank + EMBL + DDBJ + PDB sequences. No longer "non-redundant" due to computational cost.

• Genomes a specific organisms

• RefSeq- mRna or genomic- an annotated collection from NCBI Reference Sequence Project.

• EMBL- Europe's primary nucleotide sequence resource (EBI)• ….

Sequence databases

Where do we want to search?Protein databases:

• PDB- the sequences of proteins for which structures are available

• NR (non-redundant)- Non-redundant GenBank CDS translations + PDB + SwissProt + PIR + PRF, excluding those in env_nr

• RefSeq- sequences from NCBI Reference Sequence project.

• Proteins of a specific organisms

• Uniprot –swissprot or trembl

• ….

Sequence databases

Where do we want to search?

UniProt• UniProt is a collaboration between the

European Bioinformatics Institute (EBI), the Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR).

• In 2002, the three institutes decided to pool their resources and expertise and formed the UniProt Consortium.

Sequence databases

Where do we want to search?

UniProt• The world's most comprehensive catalog of

information on proteins- Sequence, function & more…

• Comprised mainly of the databases:

– SwissProt – 366226 last year, 412525 protein entries now –high quality annotation, non-redundant & cross-referenced to many other databases.

– TrEMBL - 5708298 last year, 7341751 protein entries now – computer translation of the genetic information from the EMBL Nucleotide Sequence Database many proteins are poorly annotated since only automatic annotation is generated

Overall work steps

1.Run the search- 1. Select database2. E-value threshold3. BLAST or PSI-BLAST- how many rounds?

2.Take out sequences1. HSP or full sequences2. Can (should!) filter out redundant and sequences

that are too short (fragments)

3. Usually- align sequences- choose alignment program

4.View alignment with BioEdi tor another program

5.Calculate trees, conservatino scores (conseq) etc…

Multiple Sequence Alignment (MSA)

Overall work steps

• Perform alignment of a large collection of sequences

• Many algorithms, leading ones:

1. ClustalW2. MUSCLE3. T-COFFEE

Examining BaliBase 2005…

Edgar, R.C., 2004

MUSCLE is superior!

Overall work steps

BLAST NCBI

• All program types

• Many databases to chose from, both nucleotide and protein

• 12 genome-specific databases

• Can also look for conserved domain, SNPs and more…

The well-known serverhttp://blast.ncbi.nlm.nih.gov/Blast.cgi

http://www.ncbi.nlm.nih.gov/blast/Blast.cgi

BLASTp

BLAST NCBI

QuerySequenc

Database

BLAST NCBI

BLASTp

As many as possible

Matrix

BLAST NCBI

Evalue

Mark all

Mark onlywanted

BLAST NCBI

BLAST EBI

http://www.ebi.ac.uk/blastall/index.html

Many databases,including UniProt

Insert sequenc

Get maximum number of alignments!

BLAST EBI

http://www.ebi.ac.uk/blastall/index.html

Send sequences

to ClustalW

Mark all or wanted

Get sequences

BLAST EBI

PSI-BLAST

QuerySequenc

Database

PSI-BLAST NCBI

PSI-BLAST

Pre-calculated PSSM

PSI-BLAST NCBI

Threshold for inclusionin PSSM

PSI-BLAST NCBI

Run next round

Include sequence in the PSSM

Not found inprevious round

http://www.ebi.ac.uk/blastpgp/

QuerySequenc

Database

PSI-BLAST EBI

Number of iterations

(PSI-)BLAST on ConSeq, extract sequence & align

PSI-BLAST on ConSeq

The ConSeq webserver

• Calculates evolutionary conservation scores that are than displayed on the sequence.

• Requires a Multiple Sequence Alignment (MSA)- if nor provided, can create one automatically

• Runs (PSI-)BLAST, extracts hits from the BLAST results, filters according to e-value and aligns the sequences.

PSI-BLAST on ConSeq

The ConSeq webserver-http://conseq.tau.ac.il/

PSI-BLAST on ConSeq

Query sequence

PSI-BLAST on ConSeq

Alignment

algorithmDatabase

- swissprot or uniprot

No. of homologue

sIterations

E-value

PSI-BLAST on ConSeq

All BLAST hits

Summary of web servers:

1. PSI-BLAST at NCBI-- Can control PSSM, included sequences & threshold- All types of BLAST programs- Not against UniProt- SwissProt or NR- Against RefSeq and NT- Full sequences downloaded like BLAST- Number of sequences up to 2000

NCBI vs. EBI vs. ConSeq

2. BLAST at EBI – - Against UniProt or EMBL, not NR or specific genomes- Can’t control PSSM- just get last round

- Download and align only full sequences - The number of presented sequences is limited to 500- blastN, blastP, tblastN, tblastX

3. BLAST at ConSeq – • Get HSPs, not entire sequences!!!• Only blastP• Search uniprot/swissprot• Still, can’t control all options… such as redundancy and

minimal length of HSP

(PSI-)BLAST via Max-Planck

(PSI-)BLAST via Max-PlanckRun (PSI-) BLAST

Send HSP or full sequences to an

alignment program

Forward HSP to filtrationvia “BLAMMER”

Download filtered sequences

Align the sequences via program of

choice

BLAST at Max-Planchttp://toolkit.tuebingen.mpg.de/sections/search

• Databases- swissprot, tremble, NR, env, pdb or any combination for proteins, but only NT for DNA.

• All BLAST programs

• Main advantage- you can easily extract and filter the HSPs, on top of full sequences.

The Query Protein

Name: Dihydrodipicolinate reductase

Enzyme reaction:

Molecular process: Lysine biosynthesis (early stages)

Organism: E. coli

Sequence length: 273 aa

Query:DAPB_ECOLI

<DAPB_ECOLIMHDANIRVAIAGAGGRMGRQLIQAALALEGVQLGAALEREGSSLLGSDAGELAGAGKTGVTVQSSLDAVKDDFDVFIDFTRPEGTLNHLAFCRQHGKGMVIGTTGFDEAGKQAIRDAAADIAIVFAANFSVGVNVMLKLLEKAAKVMGDYTDIEIIEAHHRHKVDAPSGTALAMGEAIAHALDKDLKDCAVYSREGHTGERVPGTIGFATVRAGDIVGEHTAMFADIGERLEITHKASSRMTFANGAVRSALWLSGKESGLFDMRDVLDLNNL

The Query Protein

(PSI-)BLAST via Max-Planckhttp://toolkit.tuebingen.mpg.de/psi_blast/

Choose database or databases

(selecting a few using CTRL)

Upload sequenceor MSA

(PSI-)BLAST via Max-Planc

Save PSi-BLAST result

E-value threshold can be assessed using the distribution

Filter Results via Max-Planck

Forward results to BLAMMER

BLAMMER

• Suppose to create MSAs from BLAST results, we will use it

just to filter the results and then align them via MUSCLE or

another known MSA program.

• Filter according to:• E-value• Min. coverage- min. percent of the query protein• Max. redundancy- extract similar sequences• Max. number of homolgoues- if wanted

http://toolkit.tuebingen.mpg.de/blammer/

Filter Results via Max-Planckhttp://toolkit.tuebingen.mpg.de/blammer

Forwarded PSI-BLAST

result

Filtering parameters

Save & thenre-align!

Align the BLAST sequences

Align via Max-Planck

http://toolkit.tuebingen.mpg.de/sections/alignment

1.Forward BLAST to MUSCLE, MAFFT etc...

Choose program

Use hits or full sequences

2. Filter via BLAMMER and then ALIGN:

Upload the results of the BLAMMER – downloaded

Alignment results:

Save the alignment

Alignmen viewing & editingBioEdit

• http://www.mbio.ncsu.edu/BioEdit/BioEdit.html

• Easy-to-use sequence alignment editor

• View and manipulate alignments up to 20,000 sequences. •Four modes of manual alignment: select and slide, dynamic grab and drag, gap insert and delete by mouse click, and on-screen typing which behaves like a text editor.

•Reads and writes Genbank, Fasta, Phylip 3.2, Phylip 4, and NBRF/PIR formats. Also reads GCG and Clustal formats

Easiest Using Bioedit

http://www.mbio.ncsu.edu/BioEdit/bioedit.html

Alignmen viewing & editing

Easiest Using Bioedit

http://www.mbio.ncsu.edu/BioEdit/bioedit.html

• Find a specific sequence: “Edit-> search -> in titles”

• Erase\add sequences: “Edit-> cut\paste\delete sequence”

• “Sequence Identity matrix” under “Alignment”- useful for a rough evaluation of distances within the alignment.

• After taking out sequences, “Minimize Alignment” under “Alignment” takes out unessential gaps.

• Can save an image using: “File -> Graphic View” & then “Edit -> Copy page as BITMAP”

Alignmen viewing & editing

Each sequence is a different story

adjust parameters:

• BLAST- E-value, substitution matrix, gap penalties, database, minimum length, redundancy level, fragment overlap…

• PSI-BLAST- BLAST parameters + PSSM inclusion threshold (or chose manually), number of rounds…

• Try using HSP or full sequences, different MSA programs…

No “Miracle solution”

THANKS

Some slides were taken from previous presentations by members of the Pupko lab and Prof. Beni Chor

Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs...

Documents

BLAST: Guía rápida BLAST. BLAST: Guía rápida

LETTER REGARDING BLAST MONITORING WORK PLAN FOR …

Reactions in the Lower Part of the Blast Furnace with Focus on …€¦ · Reactions in the Lower Part of the Blast Furnace with Focus on Silicon Joel Gustavsson Doctoral Thesis Stockholm

301 – Part B – The Chemical Crusher: Drilling and Blasting · 2016. 1. 11. · Blast Hole Pressure > Rock UCS. Step 1= Pulverized Zone. Blast hole diameter expanded. Blast

From Planni ng t o Firing...Blast planning and preparation • Define clear blast objectives – Safety – Hazards, risk assessments and controls – Blasting is part of an overall

Blast Chillers & Blast Freezers - Foodservice Design · Tecnomac Blast Chillers and Freezers are essential pieces of equipment that quickly become part of your team. ... •First

BLAST Slides BLAST ALGORITHM

Performance at full blast · Performance at full blast ... 1.5 1.0 0.5 0.0 ... Blast Head with 2 Lateral Outlets with G 3/4 " Internal Thread Order No. Part Description ID

1993 Mumbai Bomb Blast Case Supreme Court Judgement Part 2

Blast from the Past - Teach-This.com · Blast from the Past computers the world clothes films cars drinks money work education music families mobile phones housework you communication

Ticket to Work and Work Incentives - Part 1 … to Work and Work Incentives - A Two-Part Series Part 1 –Supplemental Security Income Date: Wednesday, August 23, ... including the

RNAi Blast BLAST Smith Water Man

HIGH POWER HIRE SOLUTION FOR BLAST FURNACE COOLING€¦ · The Challenge HIGH POWER HIRE SOLUTION FOR BLAST FURNACE COOLING General Manufacturing. WE MAKE IT WORK ... experience in

BLAST-IT-ALL PRESSURE BLAST CABINET

BASIC OF BLAST DESIGN PART II

Soft Media Blast Cleaning - D&D KM-IT...Blast Hose: 1 1/2-in. recommended to be no longer than 25 ft from blast hopper to work area. 4. Dead Man manual blast nozzle: 3/8-in. nozzle

Layered sacri"cial claddings under blast loading Part I * analytical studies

1993 Mumbai Bomb Blast Case Supreme Court Judgement Part 4

Introduction to Bioinformatics BLAST. Introduction –What is BLAST? –Query Sequence Formats –What does BLAST tell you? Choices –Variety of BLAST –BLAST

Home Work I. Running Blast with BioPerl Input: 1) Sequence or Acc.Num. 2) Threshold (E value cutoff) Output: 1) Blast results – sequence names, alignment