24
Sequence Similarity

Sequence Similarity · •Homologues: similar sequences –homology –homologous •Orthologs: a similar gene appears in two different organisms where –several other such similarities

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Sequence Similarity · •Homologues: similar sequences –homology –homologous •Orthologs: a similar gene appears in two different organisms where –several other such similarities

Sequence Similarity

Page 2: Sequence Similarity · •Homologues: similar sequences –homology –homologous •Orthologs: a similar gene appears in two different organisms where –several other such similarities

Why study sequence similarity?

• Possible indication of common ancestry

• Similarity of structure implies similar biological function – even among apparently distant organisms

• Example context: establishing possible causal relationship between wide use of antibiotics in agriculture and spread of antibiotic resistant bacteria

Page 3: Sequence Similarity · •Homologues: similar sequences –homology –homologous •Orthologs: a similar gene appears in two different organisms where –several other such similarities

Antibiotic resistant bacteria

• have evolved rapidly

• can thrive when antibiotics kill non-resistant bugs

• horizontal gene transfer can speed development of antibiotic resistance

Source: http://textbookofbacteriology.net/themicrobialworld/bactresanti.html

Page 4: Sequence Similarity · •Homologues: similar sequences –homology –homologous •Orthologs: a similar gene appears in two different organisms where –several other such similarities

Figure 3.2: Vertical and horizontal gene transfer

Page 5: Sequence Similarity · •Homologues: similar sequences –homology –homologous •Orthologs: a similar gene appears in two different organisms where –several other such similarities

Figure 3.3: How exposure to antibiotics selects for the survival of resistant cells in a population of bacteria

Page 6: Sequence Similarity · •Homologues: similar sequences –homology –homologous •Orthologs: a similar gene appears in two different organisms where –several other such similarities

Figure 3.4: A plasmid carrying an antibiotic-resistance gene can be transferred to a new cell by conjugation

Page 7: Sequence Similarity · •Homologues: similar sequences –homology –homologous •Orthologs: a similar gene appears in two different organisms where –several other such similarities

Antibiotic resistant bacteria

• Widespread use of antibiotics means non-resistant strains die, leaving resistant strains to survive and multiply; phenomenon observed in hospitals, care centers, etc.

• Once some bacteria in environment are resistant, HGT can occur & spread resistance faster than would otherwise occur (through mutation)

Page 8: Sequence Similarity · •Homologues: similar sequences –homology –homologous •Orthologs: a similar gene appears in two different organisms where –several other such similarities

Antibiotic resistant bacteria

• Use of antibiotics common in agriculture

• Presence in human pathogens of resistant genes that are highly similar to genes found in animals would provide evidence that HGT has occurred

Page 9: Sequence Similarity · •Homologues: similar sequences –homology –homologous •Orthologs: a similar gene appears in two different organisms where –several other such similarities

Gene similarity

• Homologues: similar sequences– homology

– homologous

• Orthologs: a similar gene appears in two different organisms where– several other such similarities occur

– organisms have common evolutionary ancestry

• Xenologs: similar gene found in organisms that have little else in common – evidence of HST

Page 10: Sequence Similarity · •Homologues: similar sequences –homology –homologous •Orthologs: a similar gene appears in two different organisms where –several other such similarities

Similarity: how close is close?

• Proteins considered homologous if 25% of residues are identical

• DNA homologous with 70% identity

• Threshold level for HST: 95% identity

Page 11: Sequence Similarity · •Homologues: similar sequences –homology –homologous •Orthologs: a similar gene appears in two different organisms where –several other such similarities

Establishing homology: alignment

• Match sequences in meaningful way

• Account for differences in sequence length due to indels:

– insertions

– deletions

• Scoring system based on closeness of match

Page 12: Sequence Similarity · •Homologues: similar sequences –homology –homologous •Orthologs: a similar gene appears in two different organisms where –several other such similarities

BLAST: Basic Local Alignment & Search Tool

• Versions exist to compare

– protein – protein

• blastp: use when you want to learn about function of protein

– protein – nucleotide

• tblastn: used to compare protein with DNA to discover new genes encoding simple proteins

– nucleotide – nucleotide

• blastn: we’ll use this to look for HGT evidence

Page 13: Sequence Similarity · •Homologues: similar sequences –homology –homologous •Orthologs: a similar gene appears in two different organisms where –several other such similarities

BLAST servers

• Home server at NCBI

• Other servers available worldwide

– BLAST servers very popular (and busy)

– Japan is sleeping when it’s morning in the USA

– Europe is sleeping when it’s afternoon in the USA

Page 14: Sequence Similarity · •Homologues: similar sequences –homology –homologous •Orthologs: a similar gene appears in two different organisms where –several other such similarities

Using blastn

• Start with query sequence – nucleotide sequence you want to investigate

• BLAST compares query with every GenBanksequence

– performs alignment

– reports matches with high degree of similarity

Page 15: Sequence Similarity · •Homologues: similar sequences –homology –homologous •Orthologs: a similar gene appears in two different organisms where –several other such similarities

Using blastn

• Point browser to NCBI website

– choose BLAST on home page

– scroll down to Basic BLAST and choose nucleotide

Page 16: Sequence Similarity · •Homologues: similar sequences –homology –homologous •Orthologs: a similar gene appears in two different organisms where –several other such similarities

Using blastn

• Paste your query sequence in the window, as shown:

Page 17: Sequence Similarity · •Homologues: similar sequences –homology –homologous •Orthologs: a similar gene appears in two different organisms where –several other such similarities

Using blastn

• Scroll down to the next box on the page, and select the database to be searched (Nucleotide, in this case)

Page 18: Sequence Similarity · •Homologues: similar sequences –homology –homologous •Orthologs: a similar gene appears in two different organisms where –several other such similarities

Using blastn

• Scroll down to the BLAST button and click it

• Then wait …

• Eventually, you’ll see a screen like this:

Page 19: Sequence Similarity · •Homologues: similar sequences –homology –homologous •Orthologs: a similar gene appears in two different organisms where –several other such similarities

BLAST results

• Graphical summary

– query sequence at top

– each bar represents portion of another sequence similar to query

• red: most similar – homologous to query

• pink: not as good

• green: borderline

• blue/black: “twilight zone”

Page 20: Sequence Similarity · •Homologues: similar sequences –homology –homologous •Orthologs: a similar gene appears in two different organisms where –several other such similarities

BLAST results: graphics section

Page 21: Sequence Similarity · •Homologues: similar sequences –homology –homologous •Orthologs: a similar gene appears in two different organisms where –several other such similarities

BLAST results: description section

Page 22: Sequence Similarity · •Homologues: similar sequences –homology –homologous •Orthologs: a similar gene appears in two different organisms where –several other such similarities

BLAST results: description section

• Accession: database entry’s GenBankaccession number

• Description: usually identifies organism, some characteristics of sequence

• Scores: based on number of matches in alignment

• E-value: statistical significance of score

Page 23: Sequence Similarity · •Homologues: similar sequences –homology –homologous •Orthologs: a similar gene appears in two different organisms where –several other such similarities

E-value

• Estimate of the number of times a match could have been produced by chance

• The lower the e-value, the greater the significance:– greater similarity between query & target

– greater confidence of homology

– identical sequences have e-value of 0; anything above .001 is considered insignificant

• E-values are written in scientific notation form

Page 24: Sequence Similarity · •Homologues: similar sequences –homology –homologous •Orthologs: a similar gene appears in two different organisms where –several other such similarities

Alignment section