Upload
patrick-leach
View
22
Download
2
Embed Size (px)
DESCRIPTION
Detecting Inversions in Human Genome. Phillip Tao Advisor: Eleazar Eskin. Polymorphism. Structural abnormality in chromosome Deletion Duplication Translocation Inversion. Inversion. Portion of chromosome is flipped Usually no major adverse effects - PowerPoint PPT Presentation
Citation preview
Detecting Inversions in Human Genome
Phillip Tao
Advisor: Eleazar Eskin
Polymorphism
Structural abnormality in chromosomeDeletionDuplicationTranslocationInversion
Inversion
Portion of chromosome is flipped Usually no major adverse effects Inverted section tends to have strong LD Small inversions are very hard to detect
Bafna’s Method
Define inversion as two breakpoints Find two SNPs on each side of each breakpoint SNP on outside of one breakpoint should
correlate higher with SNP on inside of other
breakpoint if there’s an inversion
... A ... ... T ... ... C ... ... C ...
... A ... ... G ... ... C ... ... G ...
... C ... ... T ... ... G ... ... C ...
... C ... ... G ... ... G ... ... G ...
... A ... ... G ... ... C ... ... G ...
My Goal
Simplify Bafna’s method Use r-correlation Use single SNPs instead of finding multi-SNP
markers
My Method
Calculate correlation between all SNPs For each SNP, calculate difference in
correlation between all other SNPs to it Find sets of four SNPs which fit pattern
described earlier Organize sets into groups based on position
Example
1 2 3 4 5 6 7A T C A G C GA G A A G T CT G C G G C CA T C A G C GT T C G A C G
Example r table
1 2 3 4 5 6 71 1.02 0.2 1.03 0.4 0.6 1.04 1.0 0.2 0.4 1.05 0.6 0.4 0.3 0.4 1.06 0.4 0.6 1.0 0.4 0.3 1.07 0.2 1.0 0.6 0.2 0.4 0.6 1.0
Example diff table (SNP 1)
1 2 3 4 5 6 71 2 0.03 0.2 0.04 0.8 0.6 0.05 0.4 0.2 -0.4 0.06 0.2 0.0 -0.6 -0.2 0.07 0.0 -0.2 -0.8 -0.4 -0.2 0.0
1 2 4 1 2 5 1 3 4 1 3 51 2 3 1 2 6
Example diff table (SNP 6)
1 2 3 4 5 6 71 0.02 -0.2 0.03 -0.6 -0.4 0.04 0.0 0.2 0.6 0.05 0.1 0.3 0.7 0.1 0.06 7 0.0
2 4 6 2 5 6 3 4 6 3 5 6
Example cont.
1 2 4 1 2 5 1 3 4 1 3 5 2 4 6 2 5 6 3 4 6 3 5 6 2 4 7 2 5 7 3 4 7 3 5 71 2 4 6 1 2 5 6 1 3 4 6 1 3 5 61 2 4 7 1 2 5 7 1 3 4 7 1 3 5 7
1 2 3 1 2 6
[1 – 1] [2 – 3] [4 – 5] [6 – 7]
Results
Results for 8 ENCODE regions Each encode region has about one “big”
inversion, and 3 or 4 smaller possible inversions Inversion candidates range from about 20kb to
250kb
Encode 1 CEU
length 138206:
26933775 26961947 27061501 27080620 (x1152)
[26933311 - 26935400] [26935778 - 27001979]
[27061501 - 27073984] [27074652 - 27115799]
length 24723:
27229393 27243243 27265414 27269500 (x549)
[27222615 - 27242896] [27243243 - 27247682]
[27264662 - 27267966] [27269500 - 27290893]
Encode 1 JPTCHB
length 112765:
26925087 26961569 27038413 27095921 (x696)
[26925087 - 26936161] [26936185 - 26984395]
[27018432 - 27048950] [27053451 - 27098098]
length 16797:
27286339 27297153 27308501 27317801 (x430)
[27282442 - 27291838] [27292455 - 27297184]
[27308501 - 27309252] [27309746 - 27318505]
Encode 2 CEU
length 146580:
89679961 89740881 89846316 89856918 (x10169)
[89629528 - 89702509] [89703442 - 89751478]
[89842982 - 89850022] [89851175 - 89971133]
length 103202:
89984366 90038027 90141147 90162545 (x4464)
[89960639 - 90037168] [90037945 - 90074697]
[90125136 - 90141147] [90143267 - 90244055]
Encode 2 JPTCHB
length 61931:
89740469 89777036 89815696 89844587 (x7363)
[89740469 - 89753274] [89754595 - 89783950]
[89807767 - 89816526] [89817163 - 89869295]
length 241177:
90147369 90237945 90461335 90485128 (x5137)
[90071367 - 90186818] [90223524 - 90325391]
[90457540 - 90464701] [90468056 - 90493804]
Encode 3 CEU
length 53311:
126434362 126444935 126484991 126520444 (x6392)
[126430928 - 126434467] [126435292 - 126461428]
[126483937 - 126488603] [126489707 - 126537051]
length 79164:
126717787 126750681 126810226 126838912 (x4294)
[126653273 - 126730160] [126731062 - 126753794]
[126810226 - 126810226] [126811293 - 126868969]
Encode 3 JPTCHB
length 53311:
126434155 126435292 126484017 126489707 (x8664)
[126434155 - 126434467] [126435292 - 126461428]
[126483937 - 126488603] [126489707 - 126534298]
length 56719:
126499913 126517706 126563455 126598442 (x2480)
[126461428 - 126509693] [126510624 - 126536076]
[126558033 - 126567343] [126567738 - 126622425]
Problems
Grouping algorithm not very good Many redundant groups Not weighting sets
Some candidate inversions overlap others Seems to be detecting too many Very slow and inefficient
Extensions
Improve grouping algorithm Add weighting of sets Combine similar groups Filter out sets which are likely outliers
Use other inversion detection techniques Use length constraints to filter out sets and
groups