Upload
basil-huffman
View
37
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Regular Meeting December 22, 2008. Mark Borodovsky Ivan Antonov. Topics. What have been done FSMark HMM implementation Answers to the previous meeting questions Future work. What have been done. HMM implementation in FSMark has been changed - PowerPoint PPT Presentation
Citation preview
Regular Meeting
December 22, 2008
Mark BorodovskyIvan Antonov
11/6/2008 GATech 2
Topics
1.What have been done
2.FSMark HMM implementation
3.Answers to the previous meeting questions
4.Future work
11/6/2008 GATech 3
What have been done
•HMM implementation in FSMark has been changed
•Some questions from the previous meeting have been answered
FSMark HMM implementation
11/6/2008 GATech 5
Current HMM implementation
• Currently for a given position i we look backward on 2 nucleotides instead of looking forward
• FSMark starts examining sequence from the 3rd position only (i=2), so we have complete emission string (there are strange results if we start with 1st position)
• Since FSMark starts with i=2 gene without frame shift will have state 2
11/6/2008 GATech 6
FSMark prediction depends on FS letter
• A test has been done for a sample gene inserting different letters in the middle of the gene. FSMark-GM hmm_def file was used.
FS letter FSMark prediction
A Gene overlap
C Frame shift
G Frame shift
T Frame shift
Answers to the previous meeting
questions
11/6/2008 GATech 8
Control
Genome without frame
shifts
GeneMark 417
overlaps
FSMark-GM
118 frame shifts
True Positive
0
False Positive
118
False Negative
0
11/6/2008 GATech 9
Experiment
Genome with frame shifts in
400 genes
GeneMark 599
overlaps
FSMark-GM
325 frame shifts
True Positive
113
False Positive
212
False Negative
287
171 overlaps
caused by frame shift
11/6/2008 GATech 10
Questions to answer
• Take a look at the distribution of overlap lengths in GeneMark output
• Understand why GeneMark predicts gene overlap for less than 50% of genes with Frame Shifts. There are two possible reasons:– Missing short part, i.e. GeneMark predicts one gene only– GeneMark predicts two genes but they don’t overlap
• Try to understand why did we get more False Positive in experiment than in control
11/6/2008 GATech 11
All overlaps length (genome without FS)
0
50
100
150
200
250
300
4 7 8 10 11 13 14 16 17 19 20 22 23 25 26 29 31 32 35 38 40 43 56
11/6/2008 GATech 12
Overlaps caused by frame shift
0
5
10
15
20
25
8 11 14 17 20 23 26 29 32 35 38 41 44 47 50 53 56 59 62 65 68 71 77 89 95 140
11/6/2008 GATech 13
GeneMark analysis
• Why does GeneMark barely predict overlaps for genes with frame shift?
• In my GeneMark output there are 357 typical genes (out of 400).
• Probably I use wrong GeneMark option?
11/6/2008 GATech 14
GeneMark output statistics
Genome with frame
shifts in 400 genes
4,388 gene
s
599 gene
overlaps
335 genes with fs
171 overlaps
caused by fs
22 genes with fs
are missing
fs in 164 genes didn’t
cause overlap
4 fs caused new gene downstream the initial
gene
163 decreased
their lengths
11/6/2008 GATech 15
Conclusions
• I need to check how to run GeneMark in order to get the same 400 typical genes
• It seems that the small chunk in the shifted frame is not enough for GeneMark to predict a new gene
11/6/2008 GATech 16
Time Table
Date TODO
Dec 24, Wed
Insensitive zone length analysis for FSMark to determine length of zones 1 and 3
2009 Apply FSMark-GM to 3 typical genomes using found zone 1 and 3 lengths