Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Sequence Similarity
Why study sequence similarity?
• Possible indication of common ancestry
• Similarity of structure implies similar biological function – even among apparently distant organisms
• Example context: establishing possible causal relationship between wide use of antibiotics in agriculture and spread of antibiotic resistant bacteria
Antibiotic resistant bacteria
• have evolved rapidly
• can thrive when antibiotics kill non-resistant bugs
• horizontal gene transfer can speed development of antibiotic resistance
Source: http://textbookofbacteriology.net/themicrobialworld/bactresanti.html
Figure 3.2: Vertical and horizontal gene transfer
Figure 3.3: How exposure to antibiotics selects for the survival of resistant cells in a population of bacteria
Figure 3.4: A plasmid carrying an antibiotic-resistance gene can be transferred to a new cell by conjugation
Antibiotic resistant bacteria
• Widespread use of antibiotics means non-resistant strains die, leaving resistant strains to survive and multiply; phenomenon observed in hospitals, care centers, etc.
• Once some bacteria in environment are resistant, HGT can occur & spread resistance faster than would otherwise occur (through mutation)
Antibiotic resistant bacteria
• Use of antibiotics common in agriculture
• Presence in human pathogens of resistant genes that are highly similar to genes found in animals would provide evidence that HGT has occurred
Gene similarity
• Homologues: similar sequences– homology
– homologous
• Orthologs: a similar gene appears in two different organisms where– several other such similarities occur
– organisms have common evolutionary ancestry
• Xenologs: similar gene found in organisms that have little else in common – evidence of HST
Similarity: how close is close?
• Proteins considered homologous if 25% of residues are identical
• DNA homologous with 70% identity
• Threshold level for HST: 95% identity
Establishing homology: alignment
• Match sequences in meaningful way
• Account for differences in sequence length due to indels:
– insertions
– deletions
• Scoring system based on closeness of match
BLAST: Basic Local Alignment & Search Tool
• Versions exist to compare
– protein – protein
• blastp: use when you want to learn about function of protein
– protein – nucleotide
• tblastn: used to compare protein with DNA to discover new genes encoding simple proteins
– nucleotide – nucleotide
• blastn: we’ll use this to look for HGT evidence
BLAST servers
• Home server at NCBI
• Other servers available worldwide
– BLAST servers very popular (and busy)
– Japan is sleeping when it’s morning in the USA
– Europe is sleeping when it’s afternoon in the USA
Using blastn
• Start with query sequence – nucleotide sequence you want to investigate
• BLAST compares query with every GenBanksequence
– performs alignment
– reports matches with high degree of similarity
Using blastn
• Point browser to NCBI website
– choose BLAST on home page
– scroll down to Basic BLAST and choose nucleotide
Using blastn
• Paste your query sequence in the window, as shown:
Using blastn
• Scroll down to the next box on the page, and select the database to be searched (Nucleotide, in this case)
Using blastn
• Scroll down to the BLAST button and click it
• Then wait …
• Eventually, you’ll see a screen like this:
BLAST results
• Graphical summary
– query sequence at top
– each bar represents portion of another sequence similar to query
• red: most similar – homologous to query
• pink: not as good
• green: borderline
• blue/black: “twilight zone”
BLAST results: graphics section
BLAST results: description section
BLAST results: description section
• Accession: database entry’s GenBankaccession number
• Description: usually identifies organism, some characteristics of sequence
• Scores: based on number of matches in alignment
• E-value: statistical significance of score
E-value
• Estimate of the number of times a match could have been produced by chance
• The lower the e-value, the greater the significance:– greater similarity between query & target
– greater confidence of homology
– identical sequences have e-value of 0; anything above .001 is considered insignificant
• E-values are written in scientific notation form
Alignment section