32
WELCOME TO OUR PRESENTATION OF BIO-INFORMATICS

Presentation for blast algorithm bio-informatice

  • Upload
    zahid6

  • View
    134

  • Download
    4

Embed Size (px)

Citation preview

Bioinformatics

WELCOME TO OUR PRESENTATION OF BIO-INFORMATICS

1

GROUP MEMBERS NAME A.K.M.ASADUZZAMAN KOUSHIK ROY MD.ZAHID HASAN MD.ASIF-AL-FAHAD

BLAST

Suppose you have acquired a DNA/Protein sequence derived from a sample of some environments such as lake, pond or plant.

Introduction

KLMNTRARLIVHISGLTRKSequencing process

Cell SamplesYour sequence

IntroductionOr you might get a DNA/Protein sequence from a database such as NCBI/EMBL/Swiss-Prot. You might also find an interesting gene/sequence from a journal.

KLMNTRARLIVHISGLTRKYour sequence

In that case, you might want to know if the sequence that you have, already exists or is similar to some sequences in a database, may be down to a particular organism database.

Why do you want to know that?Because you can infer structural, functional and evolutionary relationship to your query sequence.

Introduction

Already in here?Similar?Your sequence

Sequence AlignmentIn bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences.

Type Of Alignment

Local AlignmentGlobal Alignment

????????????????????????????Your SequenceUnknown SequenceWhat is this Sequence?Where does it come from?KLMNTRARLIVHISGLTRK

Introducing BLAST (Basic Local Alignment Search Tool)BLAST tool is used to compare a query sequence with a library or database of sequences.It uses a heuristic search algorithm based on statistical methods. The algorithm was invented by Stephen Altschul and his co-workers in 1990. BLAST programs were designed for fast database searching.

10

BLAST Algorithm

BLAST Algorithm

BLAST Algorithm

BLAST Algorithm (Protein)L E H K M G SQuery SequenceLength 11

L E H E H K H K MThis generates 11 3 + 1 = 9 words

H K MH K MY A N C

Y A NW = 3

BLAST Algorithm ExampleL E H For each word from a window = 3, generate neighborhood words using BLOSUM62 matrix with score threshold = 11L M H D E H L E H C E H L K H Q E H L F H L E R . . .

All aligned with LEH using BLOSUM62(then sorted by scores)1713121091199Score threshold(cut off here)20320 x 20 x 20 alignmentsSorted by scores3 Amino Acids

BLAST Algorithm ExampleL E H C E H L K H Q E H Word List DAPCQEHKRGWPNDCL E H Database sequencesL E H L E H L E H L E H L K H L K H C E H C E H QEH Exact matches of words from the word list to the database sequences

Q E H D A P C Q E H K R G W P N D CFor each exact word match, alignment is extended in both directions to find high score segments.Extended in the right directionMax drop off score X= 2

558Score drop = 3 > X

Score drop = 1 XScore drop = 2 = S can just happen by chance alone (for any query sequence).So most likely that our MSP is not a significant match at all. BLAST Algorithm Expect Value (E-Value)

If E-Value if very small e.x. 10-4 (very high score S), one can say that it is almost impossible that there would be any MSP with score >= S. Thus, our MSP is a pretty significant match (homologous). BLAST Algorithm Expect Value (E-Value)

First: Calculate bit score

S = Score of the alignment (Raw Score) , values depend on the scoring scheme and sequence composition of a database. [log value is natural logarithm (log base e)]

BLAST Algorithm E-Value Calculation

The lower the E-Value, the better. E-Value can be used to limit the number of hits in the result page. BLAST Algorithm Expect Value (E-Value)

Second: Calculate E-Value

= Bit Scorem = query lengthn = length of database

BLAST Algorithm E-Value Calculation

E-values of 10-4 and lower indicate a significant homology.

E-values between 10-4 and 10-2 should be checked (similar domains, maybe non-homologous).

E-values between 10-2 and 1 do not indicate a good homologyBLAST Algorithm E-Value Interpretation

Gapped BLASTThe Gapped BLAST algorithm allows gaps to be introduced into the alignments. That means similar regions are not broken into several segments. This method reflects biological relationships much better.This results in different parameter values when calculating E-Value ( , ).

BLAST programsNameDescriptionBlastpAmino acid query sequence against a protein databaseBlastnNucleotide query sequence against a nucleotide sequence databaseBlastxNucleotide query sequence translated in all reading frames against a protein databaseTblastnProtein query sequence against a nucleotide sequence database dynamically translated in all reading framesTblastxSix frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database.

BLAST programsNameCommon Word SizeBlastp3Blastn11Blastx3Tblastn3Tblastx3

BLAST SuggestionWhere possible use translated sequence (Protein).Split large query sequence (if > 1000 for DNA, >200 for protein) into small ones.If the query has low complexity regions or repeated segments, remove them and repeat the search. IVLKVALRPVLRPVLRPVWQARNGS

Repeated segments might confuse the program to find the real significant matches in a database.