36
Tutorial 3 BLAST 1

Tutorial 3 B LAST

  • Upload
    hawa

  • View
    63

  • Download
    0

Embed Size (px)

DESCRIPTION

Tutorial 3 B LAST. BLAST tutorial. How to use BLAST Score vs. E-value Exercise Cool story of the day: How Alzheimer is studied in yeast. BLAST. What is BLAST? B asic L ocal A lignment S earch T ool Set of similarity search programs for exploring sequence databases. . Database. - PowerPoint PPT Presentation

Citation preview

Page 1: Tutorial 3  B LAST

Tutorial 3

BLAST

1

Page 2: Tutorial 3  B LAST

BLAST tutorial

• How to use BLAST• Score vs. E-value• Exercise

• Cool story of the day: How Alzheimer is studied in yeast

2

Page 3: Tutorial 3  B LAST

BLAST program DatabaseQuery

BLAST

What is BLAST?• Basic Local Alignment Search Tool• Set of similarity search programs for exploring sequence

databases.

3

Page 4: Tutorial 3  B LAST

Why perform a similarity search?

• Find genes/proteins with possibly similar function

• Find the origin of a sequence (what organism it is taken form)

• Different degrees of similarity can be found in database search

4

Page 5: Tutorial 3  B LAST

Query type Database typeblastn Genomic Genomicblastp Proteomic Proteomicblastx Translated genomic Proteomictblastn Proteomic Translated genomic

tblastx Translated genomic Translated genomic

BLAST Databases

5

Genomic: A T G CProteomic: G A S T C V L I M P F Y W D E N Q H K R

Translated genomic: The query is genomic, translated to protein using 6 possible reading frames

ATGCCGTTC -> MPF , CR, AV

Page 6: Tutorial 3  B LAST

http://blast.ncbi.nlm.nih.gov/Blast.cgi 6

Page 7: Tutorial 3  B LAST

Place Query

Choose Database

?

7

Job title – helpful when running multiple runs

In case you want to restrict to a specific organism

In case you want to eliminate specific sequences

Query and DB parameters

Page 8: Tutorial 3  B LAST

How to choose the database?

A good place to start if you don’t know what you’re looking for

nr/nt : non-redundant nucleotide

8

Depends on what you’re looking for…

Page 9: Tutorial 3  B LAST

Alignment parameters

9

Optimize similarity level of the search

Page 10: Tutorial 3  B LAST

10

Alignment parameters

Threshold for results significance

Primary word match (16-64 nt)

Scores of matching and mismatching bases

Cost to create and extend a gap

Page 11: Tutorial 3  B LAST

11

How to interpret BLAST results?

Page 12: Tutorial 3  B LAST

Search for homologous to chick “olfactory receptor 6” gene

12

Page 13: Tutorial 3  B LAST

Search results

13

Page 14: Tutorial 3  B LAST

14

Query sequence

Matched sequences from DBs

Graphic Summary

Page 15: Tutorial 3  B LAST

15

Descriptions

Sequence Identifier

+ link

Sequence description

Score(bits)

%Coverage

%Identity

E value

Page 16: Tutorial 3  B LAST

16

AlignmentsQuery info

Alignment info

Alignment

Page 17: Tutorial 3  B LAST

It is possible to get multiple hits

per sequence

17

Page 18: Tutorial 3  B LAST

E-values and scores

18

Page 19: Tutorial 3  B LAST

Score vs. E-value• The score is a measure of the similarity of the query

to the a sequence from the database. • The E-value is a measure of the reliability of the

score.

The definition of the E-value is: The number of expected alignments with this score or higher due to chance.

19

Page 20: Tutorial 3  B LAST

Score vs. E-value

Score (S) = (identities + mismatches) - gaps

Depends on search space

Query length(bp) Effective length (total number of bases) of the database(bp)

Depends on scoring system

Score

Bit Score (S’):

20

• E-values cannot be compared across different DBs, even if the score is the same.

Page 21: Tutorial 3  B LAST

Intuition for “significance”• Think of the query as a ball, each color represents a part

of the sequence.• The DB is a pool of colored balls.

• If the ball has many colors (longer query) – there is a higher probability to see the same color in the pool by chance.

• If the pool of balls is very big, there is a higher probability to see one of the balls colors in the pool.

21

Page 22: Tutorial 3  B LAST

The typical threshold for a good E-value from a BLAST search is E=10-6≈e-6 or lower. This does not mean that higher E-values are given for queries with no biological significance.

22

E-value Threshold

http://www.youtube.com/watch?v=Z7ek7UoP7Bg&src_vid=nO0wJgZRZJs&feature=iv&annotation_id=annotation_234259

Page 23: Tutorial 3  B LAST

E-value vs. P-valueP-value is the probability that an event will happen by chanceE-value is correction of the P-value considering the DB size.

So if the probability to find a sequence is 0.001 in a 1,000,000 entries DB the number of expected alignments we will find is 1,000!

23http://homepages.ulb.ac.be/~dgonze/TEACHING/stat_scores.pdfhttp://www.ncbi.nlm.nih.gov/BLAST/tutorial/

Page 24: Tutorial 3  B LAST

Exercise

24

Page 25: Tutorial 3  B LAST

Find homologs for CFTR gene in human

25

You can put the gene ID rather than

the sequence

Human DB only

We’ll start with high similarity

Page 26: Tutorial 3  B LAST

26

Page 27: Tutorial 3  B LAST

27

Now change to more distinct

sequences

Page 28: Tutorial 3  B LAST

28

We get more results

Page 29: Tutorial 3  B LAST

Find homologs for CFTR gene in other organisms

29

Not only human

sequences

Page 30: Tutorial 3  B LAST

30

Page 31: Tutorial 3  B LAST

31

Where to run a nucleotide sequence - blastn or blastx ?

blastn (genomic vs. genomics)

blastx(translated genomics vs. proteomic)

ncRNA

If you know your sequence is a protein – blastx is better, since you will get more reliable results.

Page 32: Tutorial 3  B LAST

Cool Story of the day

How Alzheimer is studied in yeast

Page 33: Tutorial 3  B LAST

Alzheimer's disease (AD)• Alzheimer's disease leads to nerve cell

death and tissue loss throughout the brain. • Symptoms can include confusion,

aggression, trouble with language, and long term memory loss. Gradually, bodily functions are lost, ultimately leading to death.

• There are no available treatments that stop or reverse the progression of the disease.

• The disease is associated with plaques and tangles in the brain.

33http://www.alz.org/braintour/alzheimers_changes.asphttp://en.wikipedia.org/wiki/Alzheimer's_disease

Page 34: Tutorial 3  B LAST

How can AD be studied in yeast?

Yeast cells lack the specialized processes of neuronal cells and the cell-cell communications that modulate neuropathology. However, the most fundamental features of eukaryotic cell biology evolved before the split between yeast and metazoans.

34Treusch et al. Science (2011)http://lindquistlab.wi.mit.edu/

Page 35: Tutorial 3  B LAST

• Beta-amyloid () peptide is one of the hypothesized causes of AD.

• Susan Linquist’s lab showed it was toxic when expressed in yeast. Later they tested the affect of this protein on rat neuron cells and in C.elegans neurons.

35Treusch et al. Science (2011)http://lindquistlab.wi.mit.edu/

Page 36: Tutorial 3  B LAST

• The researchers looked for suppressor genes that had homologs in Human and C.elegans

36Treusch et al. Science (2011)