Sequence Similarity · •Homologues: similar sequences –homology –homologous •Orthologs: a similar gene appears in two different organisms where –several other such similarities

Sequence Similarity

Why study sequence similarity?

• Possible indication of common ancestry

• Similarity of structure implies similar biological function – even among apparently distant organisms

• Example context: establishing possible causal relationship between wide use of antibiotics in agriculture and spread of antibiotic resistant bacteria

Antibiotic resistant bacteria

• have evolved rapidly

• can thrive when antibiotics kill non-resistant bugs

• horizontal gene transfer can speed development of antibiotic resistance

Source: http://textbookofbacteriology.net/themicrobialworld/bactresanti.html

http://textbookofbacteriology.net/themicrobialworld/bactresanti.html

http://textbookofbacteriology.net/themicrobialworld/bactresanti.html

Figure 3.2: Vertical and horizontal gene transfer

Figure 3.3: How exposure to antibiotics selects for the survival of resistant cells in a population of bacteria

Figure 3.4: A plasmid carrying an antibiotic-resistance gene can be transferred to a new cell by conjugation


• Widespread use of antibiotics means non-resistant strains die, leaving resistant strains to survive and multiply; phenomenon observed in hospitals, care centers, etc.

• Once some bacteria in environment are resistant, HGT can occur & spread resistance faster than would otherwise occur (through mutation)


• Use of antibiotics common in agriculture

• Presence in human pathogens of resistant genes that are highly similar to genes found in animals would provide evidence that HGT has occurred

Gene similarity

• Homologues: similar sequences– homology

– homologous

• Orthologs: a similar gene appears in two different organisms where– several other such similarities occur

– organisms have common evolutionary ancestry

• Xenologs: similar gene found in organisms that have little else in common – evidence of HST

Similarity: how close is close?

• Proteins considered homologous if 25% of residues are identical

• DNA homologous with 70% identity

• Threshold level for HST: 95% identity

Establishing homology: alignment

• Match sequences in meaningful way

• Account for differences in sequence length due to indels:

– insertions

– deletions

• Scoring system based on closeness of match

BLAST: Basic Local Alignment & Search Tool

• Versions exist to compare

– protein – protein

• blastp: use when you want to learn about function of protein

– protein – nucleotide

• tblastn: used to compare protein with DNA to discover new genes encoding simple proteins

– nucleotide – nucleotide

• blastn: we’ll use this to look for HGT evidence

BLAST servers

• Home server at NCBI

• Other servers available worldwide

– BLAST servers very popular (and busy)

– Japan is sleeping when it’s morning in the USA

– Europe is sleeping when it’s afternoon in the USA

Using blastn

• Start with query sequence – nucleotide sequence you want to investigate

• BLAST compares query with every GenBanksequence

– performs alignment

– reports matches with high degree of similarity

Using blastn

• Point browser to NCBI website

– choose BLAST on home page

– scroll down to Basic BLAST and choose nucleotide

Using blastn

• Paste your query sequence in the window, as shown:

Using blastn

• Scroll down to the next box on the page, and select the database to be searched (Nucleotide, in this case)

Using blastn

• Scroll down to the BLAST button and click it

• Then wait …

• Eventually, you’ll see a screen like this:

BLAST results

• Graphical summary

– query sequence at top

– each bar represents portion of another sequence similar to query

• red: most similar – homologous to query

• pink: not as good

• green: borderline

• blue/black: “twilight zone”

BLAST results: graphics section

BLAST results: description section

BLAST results: description section

• Accession: database entry’s GenBankaccession number

• Description: usually identifies organism, some characteristics of sequence

• Scores: based on number of matches in alignment

• E-value: statistical significance of score

E-value

• Estimate of the number of times a match could have been produced by chance

• The lower the e-value, the greater the significance:– greater similarity between query & target

– greater confidence of homology

– identical sequences have e-value of 0; anything above .001 is considered insignificant

• E-values are written in scientific notation form

Alignment section

Documents

Sequence Similarity · •Homologues: similar sequences –homology –homologous •Orthologs: a similar gene appears in two different organisms where –several other such similarities