If you can't read please download the document
Upload
sean-davis
View
530
Download
1
Embed Size (px)
DESCRIPTION
This is an overview of using bioconductor tools for defining structural variation from next-generation sequencing data.
Citation preview
2. Why use R and BioConductor? 3. phenotype Gene Copy Number Sequence Variation Chromatin Structure and Function Gene Expression Transcriptional Regulation DNA Methylation Patient and Population Characteristics 4. Why structural variation? 5. Overview
6. What is paired-end sequencing? 7. How can paired-end sequencing be used for finding structural variants? 8. How can the existing tools within R and BioConductor be leveraged to find structural variants in the genome? 9. Overview
10. What is paired-end sequencing? 11. How can paired-end sequencing be used for finding structural variants? 12. How can the existing tools within R and BioConductor be leveraged to find structural variants in the genome? 13. What is a structural variation?
14. Deletions 15. Translocations
16. Interchromosomal Inversions 17. [copy number variation] 18. Importance of Structural Variation
19. Translocation that alters regulatory environment 20. Can place two distant functional elements in proximity to each other (gene fusion events are an example) Possibly change chromatin structure 21. Normal Karyotype Tumor Karyotype 22. Redon et al., Nature 2006 23. 24. A Genome View of Copy Number 25. Overview
26. What is paired-end sequencing? 27. How can paired-end sequencing be used for finding structural variants? 28. How can the existing tools within R and BioConductor be leveraged to find structural variants in the genome? 29. 30. 31. Insert Read Read 32. Overview
33. What is paired-end sequencing? 34. How can paired-end sequencing be used for finding structural variants? 35. How can the existing tools within R and BioConductor be leveraged to find structural variants in the genome? 36. Medvedev et al., Nature 2009 37. 38. Paired-end reads and SV
39. Find reads that show an unusually high or low insert size ( mean +/- 3sd, for example) 40. Cluster these abnormal related pairs 41. Where there is significant clustering, there may be evidence for a structural variant 42. The type of the structural variant can be determined using the relationships between clusters 43. Overview
44. What is paired-end sequencing? 45. How can paired-end sequencing be used for finding structural variants? 46. How can the existing tools within R and BioConductor be leveraged to find structural variants in the genome? 47. Experimental Setup
48. Make a new sequence that has several structural variants in it and use that as the basis for our sequencing 49. Simulate 100k paired end reads using MAQ simulate
50. 35 bp reads 51. Allow errors according to error model from real data 52. Experimental Setup, continued
53. Convert output of BWA to sorted and indexed BAM 54. Use R and Bioconductor tools to try torediscoverthe structural variants in the simulation 55. The Sample Sequence 56. The Sample Sequence, continued
57. Segment between 40.1 and 40.11 tandemly duplicated five times (a copy number variation) 58. Bioconductor and R Tools Used
59. IRanges
60. Calculate coverage on abnormal mapped pairs R graphics for making plots 61. Get data into R
62. 63. Using the SAM Flag Field 41 = 1 0 1 0 1 8 = 0 0 1 0 0 & 8 = 0 0 1 0 0 64. Insert Size Distribution 65. 66. 67. 68. 69. Future Work
70. Refine workflow for finding and clustering abnormal related pairs 71. Define or implement algorithms for taking raw clustering results and converting that to biologically meaningful descriptions of the structural variants 72. Lots, lots, lots more 73. 74. A couple of final thoughts
75. SRA (NCBI short read sequencing archive) 76. NCBI GEOnot just for microarrays Interactive visualization
77. Integrated Genome Browser (IGB, available from Affymetrix and at Sourceforge) 78. Integrated Genomic Viewer (IGV, available from the Broad Institute)