Upload
howie
View
26
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Blast in practice. BINF350, Tutorial 4 Karen Marshall. Aim. Examine how blast parameters (e.g. scoring scheme, word length) affect the alignment outcome To optimise blast parameters for alignments with different levels of sequence homology. v. Practical: Part 1. - PowerPoint PPT Presentation
Citation preview
BINF350, Tutorial 4BINF350, Tutorial 4
Karen MarshallKaren Marshall
AimAim
►Examine how blast parameters (e.g. Examine how blast parameters (e.g. scoring scheme, word length) affect scoring scheme, word length) affect the alignment outcomethe alignment outcome
►To optimise blast parameters for To optimise blast parameters for alignments with different levels of alignments with different levels of sequence homologysequence homology
Practical: Part 1Practical: Part 1
► Start with an ~200 bp original DNA sequenceStart with an ~200 bp original DNA sequence► Simulation mutation events over time and collect Simulation mutation events over time and collect
sequencessequences► Blast original sequence against mutated Blast original sequence against mutated
sequencessequences► Repeat blasts using different parameters Repeat blasts using different parameters
vMutated sequences
Original sequence
Simulation of mutated Simulation of mutated sequencessequences
► Point accepted mutation (PAM) model of Point accepted mutation (PAM) model of molecular evolution molecular evolution
► 1 PAM = 1 mutation per 100 bases on 1 PAM = 1 mutation per 100 bases on averageaverage
1 PAM 1 PAM 99.0% sequence homology 99.0% sequence homology 10 PAM 10 PAM 90.6% sequence homology 90.6% sequence homology 50 PAM 50 PAM 63.5% sequence homology 63.5% sequence homology
Concept of forward and backwards mutationConcept of forward and backwards mutation
for each ‘successive PAM’ for each ‘nucleotide’ if (rand > 0.01) do not mutate else if (rand <=0.01) mutate by random selection from the non-identical bases
Base pairPAM
Re 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
0 AGATTCACTGGTGTGGCAAGTTGTCTCTCAGACTGTACATGCATTAAAATTTTGCTTGGCATTACTCAAAAGCAAAAGAAAAGTAAAAGGAAGAAACAAGAACAAGAAAAAAGATTATATTGATTTTAAAATCATGCAAAAACTGCAACTCTGTGTTTATATTTACCTGTTTATGCTGATTGTTGCTGGTCCAGTGGATCA G A T T C A C T G G T G T G G C A A G T T G T C T C T C A G A C T G T A C A T1 AGAGTCAGTGGTGTGGCAAGTTGTCTCTCAGACTGTACATGCATTAAAATTTTGCTTGGCATTACTCAAAACCAAAAGAAAAGTAAAAGGAAGAAACAAGAACAAGAAAAAAGATTATATTGATTTTAAAATCATGCAAAAACTGCAAGTCTGTGTTTATATTTACCTGTTTATGCTGATTGTTGCTGGTCCGGTGGATCA G A G T C A G T G G T G T G G C A A G T T G T C T C T C A G A C T G T A C A T2 AGAGTCAGTGGTGTGGCACGTTGTCTCTCAGACTGTACATGCATTTAAATTTTGCTTGGCATTACTCAAAACCAAAAGAAAAGTAAAAGGAAGAAACAAGAACAAGAAAAAAGATTATATTGATTTTAAAATCATGCAAAAACTGCAAGTCTGTGTTTATATTTACCTGTTTATGCTGATTGTTGCTGGTCCGGTGGATCA G A G T C A G T G G T G T G G C A C G T T G T C T C T C A G A C T G T A C A T3 AGAGTCAGTGGTGTGGCACGTTGTCTCTCAGACTGTACATGCATTTAAATTTTGCTTGGCATTACTCAAAACCAAAAGAAAAGTAAAAGGAAGAAACAATAACAAGAAAAAAGATTATATTGATTTTAAAATCATGCAAAAACTGCAAGTCTGTGTTTATATTTTCCTGTTTATGCTGATTGTTGCTGGTCCGGTGGATCA G A G T C A G T G G T G T G G C A C G T T G T C T C T C A G A C T G T A C A T4 AGTGTCAGTGGTGTGGCACGTTGTCTCTCAGACTGTACATGCATTTAAATTTCGCTTGGCATTAATCAAAACCAAAAGAAAAGGAAAAGGAAGAAACAATAACCAGAAAAAAGATTATATTGATTTTAAAATCATGCAAAAACTGCAAGTCTGTGTTTATATTTTCCTGTTTATGCTGATTGTTGCTGGTCCGGTGGATCA G T G T C A G T G G T G T G G C A C G T T G T C T C T C A G A C T G T A C A T5 AGTGTCAGTGGTGTGGCACGTTGTCTCTCAGACTGTACATGCATTTAAATTTCGCTTGGCATTAATCAAAACCAAAAGAAAAGGAAAAGGAAGAAACAATAACCAGAATAAAGATTATTTTGATTTTAAAATCATGCAAAAACTGCAAGTCTGTGTTTATATTTTCCTGTTTATGCTGATTGTTGCTGGTCCGGTGGATCA G T G T C A G T G G T G T G G C A C G T T G T C T C T C A G A C T G T A C A T6 AGTGTCAGTGGTGTGGCACGTTGTCTCTCAGACGGTACATGCATTTAAATTTCGCTTGGCATTAATCAAAACCATAAGAAAAGGAAAAGGAAGAAACAATAACCAGAATAAAGATTATTTTGATTTTAAAATCATGCAAAAACTGCAAGTCTGTGTTTATATTTTCCTGTTTATGCTGATTGTTGCTGGTCCGGTGGATCA G T G T C A G T G G T G T G G C A C G T T G T C T C T C A G A C G G T A C A T7 AGTGTCAGTGGTGTGGCACGTTGTCTCTCAGACGGTACATGCATTTAAATTTCGCTTGGCATTAATCAAAACCATAAGAAAAGGAAAAGGAAGAAACAATAACCAGAATAAAGATTATTTTGATTTTAAAATCATGCAAAAACTGCAAGTCTGTGTTTATATTTTCCTGTTTATGCTGATTGTTGCTGGTCCGGTGGATCA G T G T C A G T G G T G T G G C A C G T T G T C T C T C A G A C G G T A C A T8 AGTGTCAGTGGTGTGGCACGTTGTCTCTCAGACGGTACATGCATTTAAATTTCGCTCGGCATTAATCAAAACCATAAGAAAAGGAAAAGGAAGAAACAATAACCAGAATAAAGATTATTTTGATTTTAAAATCATGCAAAAACTGCAAGTCTGTGTTTATATTTTCCTGTTTATGCTGATTGTTGCTGGTCCGGTGGATCA G T G T C A G T G G T G T G G C A C G T T G T C T C T C A G A C G G T A C A T9 AGTGTCAGTGGTGTGGCACGTTGTCTCTCAAACGGTACATGCATTTAAATTTCGCTCGGCATTAATCAAAACCATAAGAAAAGGAAGAGGAAGAAACAATAACCAGAATAAAGATTATTTTGATTTTAAAATCATGCAAAAACTGCAAGTCTGTGTTTATATTTTCCTGTTTATGCTGATTGTTGCTGGTCCGGTGGATCA G T G T C A G T G G T G T G G C A C G T T G T C T C T C A A A C G G T A C A T
10 AGTGTCAGTGGTGTGGCACGTTGTCTCTCAAACGGCACGTGCATTTAAATTTAGCTCGGCATTAATCAAAACCATAAGAAAAGTAAGAGGAAGAAACAATAACCAGAATAAAGATTATTTTGCTTTTAAAATCTTGCAAAAACTGCAAGTCTGTGTTTATATTTTCCTGTTTATGCTGATTGTTGCTGGTCCGGTGGATCA G T G T C A G T G G T G T G G C A C G T T G T C T C T C A A A C G G C A C G T11 AGTGTCAGTGGTGTGGCACGTTGTCTCTCAAACGGCACGTGCATTTAAATTTAGCTCGGCATTAATCAAAACCAAAAGAAAAGTAAGAGGAAGAAACAATAACCAGAATAAAGATTATTTTGCTTTTAAAATCTTGCAAAAACTGCAAGTCTGTGTTTATATTTTCCTGTTTATGCTGATTGTTGCTGGTCCGGTGGATCA G T G T C A G T G G T G T G G C A C G T T G T C T C T C A A A C G G C A C G T12 AGTGTCAGTGGTGTTGCACGTTGTCTCTCAAACGGCACGTGCATTTCAATTTAGCTCGGCATTAATCAAAACCAAAAGAATAGTAAGAGGAAGTAACAATAACCAGAATAAAGATTATTTTGCTTTTAAAATCTTGCAAAAACTGCAAGTCTGTGTTTATATTTTCCTGTTTATGCTGATTGTTGCTGGTCCGGTGGATCA G T G T C A G T G G T G T T G C A C G T T G T C T C T C A A A C G G C A C G T13 AGTGTCAGTGGTGTTGCACGTTGTCTCTCAAACGGCACGTGCATTACAATTTAGCGCGGCATTAATCAAAACCAAAAGAATAGTAAGAGGAAGTAACAATAACCAGAAAAAAGATTATTTTGCTTTTAAAATCTTGCAAAAACTGCAAGTCTGTGTTTATATTTTCCTGTTTATGCTGATTGTTGCTGGTCCGGTGGATCA G T G T C A G T G G T G T T G C A C G T T G T C T C T C A A A C G G C A C G T14 AGTGTCAGTGGTGTTGCACGTTGTCTCTCAAACGGCACGTGCATTACAATTTAGCGCGGCATTAATCAAAACCAAAAGAATAGTAAGAGGAAGTAACAATAACCAGAAAAAAGATTATTTTGCTTTTAAAATCTTGCAAAAACTGCAAGTCTGTGTTTATATTTTCCTGTTTATGCCGATTGTTGCTGGTCCGGTGGATCA G T G T C A G T G G T G T T G C A C G T T G T C T C T C A A A C G G C A C G T15 AGTGTCAGTGGTGTTGTACGTTGTCTCTCAAACGGCACGTGCATTACGATTTAGCGCGGCATTAATAAAAACCAAAAGAATAGTAAGAGGAAGTAACAATAACCAGAAAAAAGATTATTTTGCTTTTAAAATCTTGCAAATACTGCAAGTCTGTGTTTATATTTTCCTGTTTATGCCGATTGTTGCTGGTCCGGTGGACCA G T G T C A G T G G T G T T G T A C G T T G T C T C T C A A A C G G C A C G T16 AGTGTCAGTGGTGTTGTACGTTGTCTCTCAAACGGCACGTGCATTACGATTTAGCGCGGCATTAATAAAAACCAAAAGAATAGTAAGAGGAAGTAACAATAACCAGAAAAAAGATTATTTTGCTTTTAAAATCTTGCAAATACTGCAAGTCTGTGTTTATATTCTCCTGTTTATGCCGATTGTTGCTGGTCCGGTGGACCA G T G T C A G T G G T G T T G T A C G T T G T C T C T C A A A C G G C A C G T17 AGTGTCAGTGGTGTTGTACGTTGTCTCTCAAACGGCACGTGCATTACGATTTAGCGCGGCATTAATAAAAACCAAAAGAATAGGAAGAGGAAGTAACAATAACCAGAAAAAAGATTATTTTGCTTTTAAAATCTTGCAAATACTGCAAGTCTGTGTTTATATTCTCCTGTTTATGCCGATTGTTGCTGATCCGGTGGACCA G T G T C A G T G G T G T T G T A C G T T G T C T C T C A A A C G G C A C G T18 AGTGTCAGTGGTGTTGTACGTTGTCTCTCAAACGGCACGTGCATTACGATTTAGCGCGGCATTAGTAAAAACCAAAAGAATAGGAAGAGGAAGTAACAATAACCAGAAAAAAGATTATTTTGCTTTTAAAATCTTGCAAATACTGCAAGTCTGTGTTTATATTCTCCTGTTTATGCCGATTGTTGCTGATCCGGTGGACCA G T G T C A G T G G T G T T G T A C G T T G T C T C T C A A A C G G C A C G T19 AGTGTCAGTGGTGTTGTAAGTTTTCTCTCAAACGGCACGTGCATTACGATTTAGCGCGGCATTAGTAAAAACCAAAAGAATAGGAAGAGGAAGTAACAATAACCAGAAAAAAGATTATTTTGCTTTTAAAATCTTGCAAATACTGCAAGTCTGTGTTTATATTCTCCTGTTTATGCCGATTGTTGCTGATCCGGTGGACCA G T G T C A G T G G T G T T G T A A G T T T T C T C T C A A A C G G C A C G T20 AGTGTCAGTGGTGTTGTAAGTTTTCTCTCAAACGGCACGTGCATTACGATTTAGCGCGGCATTAGTAAAAACCAAAAGGATAGGAAGAGGAAGTAACAATCACCGGAAAAAAGATTATTTTGCTTTTAAAATCTTGCAAATACTGCAAGTCTGTGTTTATATTCTCCTGTTTATGCCGATTGTTGCTGATCCGGTGGACCA G T G T C A G T G G T G T T G T A A G T T T T C T C T C A A A C G G C A C G T21 AGTGTCAGTGGTGTTGTAAGTTTTCTCTCAAACGGCACGTGCATTACGATTTAGCGCGGCATTAGTAAAAACCAAAAGGATAGGAAGAGGAAGTAACAATCACCGGAAAAAAGATTATTTTGCTTTTAAAATCTTGCAAATACTGCTAGTCTGTGTTTATATTCTCCTGTTTATGCCGATTGTTGCTGATCCGGTGGACCA G T G T C A G T G G T G T T G T A A G T T T T C T C T C A A A C G G C A C G T22 AGTGTCAGTGGTGTTGTAAGTTTTCTCTCAAACGGCACGTGCATTACGATTTAGCGCGGCATTAGTAAAAACCAAAAGGATAGGAAGAGGAAGTAACAGTCACCGGAAAAAAGATTATTTTGCTTTTAAAATCTTGCAAATACTGCTAGTCTGTATTTATATTCTCCTGTTTATGCCGATTGTTGCTGATCCGGTGGACCA G T G T C A G T G G T G T T G T A A G T T T T C T C T C A A A C G G C A C G T23 AGTGACAGTGGTGTTGTAAGTTTTCTCTCGAACGGCACGTGCATTACGATTTAGCGCGGCATTAGTAAAAACCAAAAGGATAGGAAGAGGAAGTAACAGTCACCGGAAAAAAGATTATTTTGCTTTTAAAATCTTGCAAATACTGCTAGTCTGTATTTATATTCTCCTGTTTATGCCGATTGTTGCTGATCCGGTGGACCA G T G A C A G T G G T G T T G T A A G T T T T C T C T C G A A C G G C A C G T24 AGTGACAGTGGTGTTGTAAGTTTTCTCTCGAACGGCACGTGCATTACGATTTAGCGCGGCATTAGTAAAAACCAAAAGGATAGGAAGAGGAAGTAACAGTCACCGGAAAAAAGATTATTTTGCTTTTAAAATCTTGCAAATACTGCTAGTCTGTATTTATATTCTCCTGTTTATGCCGATTGTTGCTGATCCGGTGGACCA G T G A C A G T G G T G T T G T A A G T T T T C T C T C G A A C G G C A C G T25 AGTGACAGTGGTGTTGTAAGTTTTCTCTCGAACGGCACGTGCATTACGATTTAGCGCGGCATTAGTAAAAACCAAAAGGATAGGAAGAGGAAGTAACAGTCACCGGAAAAAAGATTATTTTGCTTTTAAAATCTTGCAAATACTGCTAGTCTGTATTTATATTCTCCTGTTTATGCCGATTGTTGCTGATCCGGTGGACCA G T G A C A G T G G T G T T G T A A G T T T T C T C T C G A A C G G C A C G T26 AGTGACAGTGGTGTTGTAAGTTTTCTCTCGAACGGCACGTGCATTATGATTTAGCGCGGCATTAGTAAAAACCAAAAGGATAGGAAGAGGAAGTAACAGTCACCGGAAAAAAGATTATTTTGCTTTTAAAATCTTGCAAATACTGCTAGTCTGTATTTATATTCTCCTGTTTATGCCGATTTTTGCTGATCCGGTGGACCA G T G A C A G T G G T G T T G T A A G T T T T C T C T C G A A C G G C A C G T27 AGTGACAGTGGTGTTGTAAGTTTTCTCTCGAACGGCACGTGCATTATGATTTAGCGCGGCATTAGTAAAAACCAAAAGGATAGGAAGAGGAAGTAACAGTCACCGGAAAAAAGATTATTTTGCTTTTAAAATCTTGCAAATACTGCTAGTCTGTATTTATATTCTCCTGTTTATGCCGATTTTTGCTGATCCGGTGGGCCA G T G A C A G T G G T G T T G T A A G T T T T C T C T C G A A C G G C A C G T28 AGTGACAGTGGTGTTGTAAGTTTTCTCTCGAACGGCACGTGCATTATGATTTAGCGCGGCATTAGTAAAAACCAAAAGGATAGGAAGAGGAAGTAACTGTCACCGGAAAAAAGATTATTTTGCTTTTAAAATCTTGCAAATACTGCTAGTCTGTATTTATATTCTCCTGTTTATGCCGATTTTTGCTGATCCGGTGGGCCA G T G A C A G T G G T G T T G T A A G T T T T C T C T C G A A C G G C A C G T29 AGTGACAGTGGTGTTGTAAGTTTTCTCTCGAACGGCACGTGCATTATGATTTAGCGCGGCATTAGTAAAAACCAAAAGGATAGGAAGAGGAAGTAACTGTCACCGGAAAAAAGATTATTTTGCTTTTAAAATGTTGCAAATACTGCTAGTCTGTATTTATATTCTCCTGTTTATGCCGATTTTTGCTGATCCGGTGGGCCA G T G A C A G T G G T G T T G T A A G T T T T C T C T C G A A C G G C A C G T30 AGTGACAGTGGTGTTGTAAGTTTTCTCTCGAACGGCACGTGCATTATGATTTAGCGCGGCATTAGTAAAAACCAAAAGGATAGGAAGAGGAAGTAACTGTCACCGGAAAAAAGATTATTTTGCTTTTAAAATGTTGCAAATACTGCTAGTCTGTATTTATATTCTCCTGTTTATGCCGATTTTTGCTGATCCGGTGGGCCA G T G A C A G T G G T G T T G T A A G T T T T C T C T C G A A C G G C A C G T
BLAST - HeuristicBLAST - HeuristicStep
123
Suffix TreeLookup table
•Words/seeds•Location•Threshold T•Larger seq file
BLASTBLASTFebruary 10, 2004: BLAST 2.2.8 released
BLAST 2.2.8 release notes•Correction to tblastx alignment computation •ia32-linux now requires glibc 2.2.5
Source code can be obtained from: ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools/old/20040204/ncbi.tar.gz . Binaries can be obtained from: ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release/2.2.8/ .
February 2, 2004: BLAST 2.2.7 released BLAST 2.2.7 release notes •Standalone BLAST is now available for amd64-linux. •formatdb now restricts volume sizes to 1G on 32-bit platforms for performance reasons. •The -A option has been removed from formatdb, that is, all databases will be created with ASN.1 deflines. •tblastn query concatenation now works correctly on 64-bit platforms. •The wwwblast source code has been merged into the C toolkit tree and is no longer distributed with the binaries.
Source code can be obtained from: ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools/old/20040202/ncbi.tar.gz . Binaries can be obtained from: ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release/2.2.7/ .
http://www.ncbi.nih.gov/BLAST/blast_whatsnew.shtml
BLAST on your own machineBLAST on your own machine
► Allows you to BLAST multiple sequences Allows you to BLAST multiple sequences most web versions are single sequence onlymost web versions are single sequence only
► StepsSteps Sequence files in FASTA formatSequence files in FASTA format
Can have multiple sequences in each file but no Can have multiple sequences in each file but no duplicatesduplicates
Format larger sequence file into a databaseFormat larger sequence file into a databaseFormatdb –i dbfile.txt –p F –o TFormatdb –i dbfile.txt –p F –o T
Perform BLAST using appropriate switchesPerform BLAST using appropriate switchesBLASTALL –p BLASTN –d dbfile.txt –i comp.txt –o out.txtBLASTALL –p BLASTN –d dbfile.txt –i comp.txt –o out.txt
BLAST 2.2.8BLAST 2.2.8
► Arguments Arguments see appendix of handoutsee appendix of handout
––W for seed word length (default = 11)W for seed word length (default = 11) -r reward for a match (default = 1)-r reward for a match (default = 1) -q penalty for a mismatch (default = 3)-q penalty for a mismatch (default = 3) -G cost to open a gap-G cost to open a gap -E cost to extend a gap-E cost to extend a gap -F filter query sequence-F filter query sequence -e to set threshold expectation (threshold for HSP -e to set threshold expectation (threshold for HSP
before gaps are included)before gaps are included) -m to specify different output options-m to specify different output options
Score EScore ESequences producing significant alignments: Sequences producing significant alignments: (bits) Value(bits) Value
1_10 170 3e-0461_10 170 3e-0460_0 170 3e-0460_0 170 3e-0464_10 115 2e-0294_10 115 2e-0292_10 107 4e-0272_10 107 4e-0275_10 96 2e-0235_10 96 2e-0233_10 96 2e-0233_10 96 2e-0234_20 68 3e-0154_20 68 3e-0152_20 68 3e-0152_20 68 3e-0155_20 56 1e-0115_20 56 1e-011
QUERY 1 agattcactggtgtggcaagttgtctctcagactgtacatgcattaaaattttgcttggc 60QUERY 1 agattcactggtgtggcaagttgtctctcagactgtacatgcattaaaattttgcttggc 601_10 1 ............................................................ 601_10 1 ............................................................ 600_0 1 ............................................................ 600_0 1 ............................................................ 604_10 3 ....t.....c......ag..................a.................... 604_10 3 ....t.....c......ag..................a.................... 602_10 1 ............a..c....a...........a................g.......... 602_10 1 ............a..c....a...........a................g.......... 605_10 2 ........c......a.........g............................c.... 605_10 2 ........c......a.........g............................c.... 603_10 1 .................g........t.....................c.....a..... 603_10 1 .................g........t.....................c.....a..... 604_20 3 ....t.....c......ag....a.....g.......a.................... 604_20 3 ....t.....c......ag....a.....g.......a.................... 602_20 1 ............a..c...ta...........aa......c..a.....g..... 552_20 1 ............a..c...ta...........aa......c..a.....g..... 555_20 4 ......c..c...a....g....g..............a......c......c.... 605_20 4 ......c..c...a....g....g..............a......c......c.... 60
Example of BLAST output: -Example of BLAST output: -m3m3
Substitution scoresSubstitution scores
► Optimal substitution Optimal substitution scores were derived scores were derived for different PAM for different PAM distances / sequence distances / sequence homologies (States homologies (States et al., 1991)et al., 1991)
► Of importance is the Of importance is the match to mismatch match to mismatch score ratioscore ratio
Substitution scoresSubstitution scores
► ‘‘Better’ substitution Better’ substitution matrices exist, but matrices exist, but not yet not yet implemented in implemented in most BLAST most BLAST softwaresoftware
Practical: Part 2Practical: Part 2► Apply concepts from Part 1 to ‘real sequences’Apply concepts from Part 1 to ‘real sequences’► BLAST mRNA sequence for human and cattle BLAST mRNA sequence for human and cattle
INFG to an ~1/2 Mb sequence of human DNA INFG to an ~1/2 Mb sequence of human DNA ► Use optimal blast parameters for expected Use optimal blast parameters for expected
homologyhomology
Human DNA
Human INFG mRNACattle INFG mRNA
Expected levels of sequence Expected levels of sequence homologyhomology
► Varies for sequences being considered and Varies for sequences being considered and genomic regiongenomic region
Human to mouse comparison, from …
Efficiency of BLAST Efficiency of BLAST
► Human to Human to cattle coding cattle coding sequence sequence ~85% ~85% homologyhomology
(~PAM 15)(~PAM 15)
INFG mRNA sequencesINFG mRNA sequences
► Extracted from NCBI website using batch entrezExtracted from NCBI website using batch entrez
>gi|10835170|ref|NM_000619.1| Homo sapiens interferon, gamma (IFNG), mRNA>gi|10835170|ref|NM_000619.1| Homo sapiens interferon, gamma (IFNG), mRNATGAAGATCAGCTATTAGAAGAGAAAGATCAGTTAAGTCCTTTGGACCTGATCAGCTTGATACAAGAACTATGAAGATCAGCTATTAGAAGAGAAAGATCAGTTAAGTCCTTTGGACCTGATCAGCTTGATACAAGAACTACTGATTTCAACTTCTTTGGCTTAATTCTCTCGGAAACGATGAAATATACAAGTTATATCTTGGCTTTTCACTGATTTCAACTTCTTTGGCTTAATTCTCTCGGAAACGATGAAATATACAAGTTATATCTTGGCTTTTCAGCTCTGCATCGTTTTGGGTTCTCTTGGCTGTTACTGCCAGGACCCATATGTAAAAGAAGCAGAAAACCTTGCTCTGCATCGTTTTGGGTTCTCTTGGCTGTTACTGCCAGGACCCATATGTAAAAGAAGCAGAAAACCTTAAGAAATATTTTAATGCAGGTCATTCAGATGTAGCGGATAATGGAACTCTTTTCTTAGGCATTTTGAAGAAAGAAATATTTTAATGCAGGTCATTCAGATGTAGCGGATAATGGAACTCTTTTCTTAGGCATTTTGAAGAATTGGAAAGAGGAGAGTGACAGAAAAATAATGCAGAGCCAAATTGTCTCCTTTTACTTCAAACTTTTTAAATTGGAAAGAGGAGAGTGACAGAAAAATAATGCAGAGCCAAATTGTCTCCTTTTACTTCAAACTTTTTAAAAACTTTAAAGATGACCAGAGCATCCAAAAGAGTGTGGAGACCATCAAGGAAGACATGAATGTCAAGTTTAAACTTTAAAGATGACCAGAGCATCCAAAAGAGTGTGGAGACCATCAAGGAAGACATGAATGTCAAGTTTTTCAATAGCAACAAAAAGAAACGAGATGACTTCGAAAAGCTGACTAATTATTCGGTAACTGACTTGAATGTTCAATAGCAACAAAAAGAAACGAGATGACTTCGAAAAGCTGACTAATTATTCGGTAACTGACTTGAATGTCCAACGCAAAGCAATACATGAACTCATCCAAGTGATGGCTGAACTGTCGCCAGCAGCTAAAACAGGGAATCCAACGCAAAGCAATACATGAACTCATCCAAGTGATGGCTGAACTGTCGCCAGCAGCTAAAACAGGGAAGCGAAAAAGGAGTCAGATGCTGTTTCAAGGTCGAAGAGCATCCCAGTAATGGTTGTCCTGCCTGCAATATGCGAAAAAGGAGTCAGATGCTGTTTCAAGGTCGAAGAGCATCCCAGTAATGGTTGTCCTGCCTGCAATATTTGAATTTTAAATCTAAATCTATTTATTAATATTTAACATTATTTATATGGGGAATATATTTTTAGACTCTTGAATTTTAAATCTAAATCTATTTATTAATATTTAACATTATTTATATGGGGAATATATTTTTAGACTCATCAATCAAATAAGTATTTATAATAGCAACTTTTGTGTAATGAAAATGAATATCTATTAATATATGTATTATCAATCAAATAAGTATTTATAATAGCAACTTTTGTGTAATGAAAATGAATATCTATTAATATATGTATTATTTATAATTCCTATATCCTGTGACTGTCTCACTTAATCCTTTGTTTTCTGACTAATTAGGCAAGGCTATATTTATAATTCCTATATCCTGTGACTGTCTCACTTAATCCTTTGTTTTCTGACTAATTAGGCAAGGCTATGTGATTACAAGGCTTTATCTCAGGGGCCAACTAGGCAGCCAACCTAAGCAAGATCCCATGGGTTGTGTGTGTGATTACAAGGCTTTATCTCAGGGGCCAACTAGGCAGCCAACCTAAGCAAGATCCCATGGGTTGTGTGTTTATTTCACTTGATGATACAATGAACACTTATAAGTGAAGTGATACTATCCAGTTACTGCCGGTTTGAAATTATTTCACTTGATGATACAATGAACACTTATAAGTGAAGTGATACTATCCAGTTACTGCCGGTTTGAAAATATGCCTGCAATCTGAGCCAGTGCTTTAATGGCATGTCAGACAGAACTTGAATGTGTCAGGTGACCCTGATATGCCTGCAATCTGAGCCAGTGCTTTAATGGCATGTCAGACAGAACTTGAATGTGTCAGGTGACCCTGATGAAAACATAGCATCTCAGGAGATTTCATGCCTGGTGCTTCCAAATATTGTTGACAACTGTGACTGTACATGAAAACATAGCATCTCAGGAGATTTCATGCCTGGTGCTTCCAAATATTGTTGACAACTGTGACTGTACCCAAATGGAAAGTAACTCATTTGTTAAAATTATCAATATCTAATATATATGAATAAAGTGTAAGTTCACACCAAATGGAAAGTAACTCATTTGTTAAAATTATCAATATCTAATATATATGAATAAAGTGTAAGTTCACAACTACT
>gi|31982948|ref|NM_174086.1| Bos taurus interferon, gamma or immune type [interferon >gi|31982948|ref|NM_174086.1| Bos taurus interferon, gamma or immune type [interferon gamma type 2] (IFNG), mRNAgamma type 2] (IFNG), mRNA
ATTAGAAAAGAAAGATCAGCTACCTCCTTGGGACCTGATCATAACACAGGAGCTACCGATTTCAACTACTATTAGAAAAGAAAGATCAGCTACCTCCTTGGGACCTGATCATAACACAGGAGCTACCGATTTCAACTACTCCGGCCTAACTCTCTCCTAAACAATGAAATATACAAGCTATTTCTTAGCTTTACTGCTCTGTGGGCTTTTCCGGCCTAACTCTCTCCTAAACAATGAAATATACAAGCTATTTCTTAGCTTTACTGCTCTGTGGGCTTTTGGGTTTTTCTGGTTCTTATGGCCAGGGCCAATTTTTTAGAGAAATAGAAAACTTAAAGGAGTATTTTAATGGGTTTTTCTGGTTCTTATGGCCAGGGCCAATTTTTTAGAGAAATAGAAAACTTAAAGGAGTATTTTAATGCAAGTAGCCCAGATGTAGCTAAGGGTGGGCCTCTCTTCTCAGAAATTTTGAAGAATTGGAAAGATGAAAGCAAGTAGCCCAGATGTAGCTAAGGGTGGGCCTCTCTTCTCAGAAATTTTGAAGAATTGGAAAGATGAAA
INFG_refseq.txt
Human Chr12 sub-sequenceHuman Chr12 sub-sequence► Extracted from USCS ‘Golden Path’ websiteExtracted from USCS ‘Golden Path’ website► chr12:66,589,493-67,085,092 ~ ½ Mbchr12:66,589,493-67,085,092 ~ ½ Mb
does contain INFG gene does contain INFG gene ► Repeats masked to lower caseRepeats masked to lower case
>hg16_dna range=chr12:66589493-67085092 5'pad=0 3'pad=0 revComp=FALSE strand=? repeatMasking=lowerCATTCATTACTTTTATAAGGTTTCTCTCTGGTATGCATCTGACTTACATCATGGGAAAGCTAGTTTCATGACTCCTTTGGAATAGTTGTGGTCCTGAATATGGAAAATCAATTAATGAATAGCTTAAAGCACAATAGTCAACAAATAGATGTGAAAATTCTTTGTGAACTTTAAAGTCTTACTTAAACGTGAGATATTATATACAGTGTTTTATGTtagactgtgagcttgttaaagaaagaactatgccttctttttctttctaccagttccagtgcctcgtacaacatagaaaccataagtgtttttgaaagagcaaatGAATATTGGAAGGAGTAAGGTGATAGCTAAAGCTAAAACAATGTTTAGGGAGAACAACTGAAACAAAAGCAGCATTTGTGTCTTAAACTCATGGCCTCTGAAACAGCCTTGATAGATAGTAGAGAGGGTCAGATAGAGAGAGCCTGACTCAGAGATTGGGAAGCCCTATATGGTTGGAAGAGAAAGTAAGAGGAGACCCAAAGTATTAGACCACAGAAAGAAGTTCTAATAGTCAGTGTCAAGAGATTCAGCAGGAGGTTGTGTATCAGGATTTGGGTTTGGGAGTGGTATGGAGCTTACCTATCTCTAAAACGAGCAGGAGGGCAAAAATGAATCCCAGTCCCAAAGAATTCACTAATGGCCAGCAAACCAACACAGGAACCCCAGCACAGACACACAAGATAGGAAACCAGTTGTTGAAACTACAATGTAACGGGGCTGATTTAATAAAAACCTGTTACATGAGTTATAGGttttttttttttttttttttttttAATGTATGTGCCCCACCTTAGGAAAGCCAGAAATAATGGCAACGAAGAAATATTCATTCACAGTGAGAAAGCCATTAGAACGTTGGCTGGAACCTAGGGGCATATCGAGGGCCCACGTGGGAAGGACAATGACAACTTGTTTAGTCCTCACTGGTTTCCCAGTCTGTGGATCTTATTTGAAT
hs_chr12_subseq.txt
Human INFG geneHuman INFG gene
The exon / intron report from NCBI ‘AceView’ for NM_000619 is as follows:
In variant
Length & DNA
Coordinates on gene
Supporting clone (s)
Exon 1 243bp 1 to 243 M29383
Intron [gt-ag] 1242bp 244 to 1485 M29383 and 32 others
Exon 2 69bp 1486 to 1554 NM_000619 and 32 others
Intron [gt-ag] 95bp 1555 to 1649 NM_000619 and 33 others
Exon 3 183bp 1650 to 1832 NM_000619 and 24 others
Intron [gt-ag] 2425bp 1833 to 4257 NM_000619 and 24 others
Exon 4 725bp 4258 to 4982
Human INFG geneHuman INFG gene
From USCS ‘Golden Path website’ genome browserFrom USCS ‘Golden Path website’ genome browser
INFG against ~1/2 Mb region of Chr INFG against ~1/2 Mb region of Chr 1212
AssessmentAssessment
►Submit Submit
for for eithereither Part 1 Part 1 oror Part 2 the BLAST Part 2 the BLAST output, concatenated into one file and output, concatenated into one file and annotatedannotated
a short summary / discussion of the a short summary / discussion of the concepts covered in this practical (< 500 concepts covered in this practical (< 500 words)words)
ReferencesReferences
►Strongly recommend BLAST tutorial on Strongly recommend BLAST tutorial on NCBI siteNCBI site http://http://www.ncbi.nlm.nih.govwww.ncbi.nlm.nih.gov
/BLAST/tutorial/ Altschul-1.html/BLAST/tutorial/ Altschul-1.html
►Further “Bioinformatics for quantitative Further “Bioinformatics for quantitative geneticists course notes” J. McEwan geneticists course notes” J. McEwan http://www-http://www-personal.une.edu.au/~jvanderwpersonal.une.edu.au/~jvanderw
/ aabc_materials2004.htm#ModuleC/ aabc_materials2004.htm#ModuleC