Click here to load reader

ppt - Autism Exome sequencing

  • View

  • Download

Embed Size (px)



Text of ppt - Autism Exome sequencing

  • 1. Autism exome sequencing:design, data processing and analysis
    Benjamin Neale
    Analytic and Translational Genetics Unit, MGH
    Medical & Population Genetics Program, Broad Institute

2. Direct Sequencing has Enormous Potential
Ng, Shendure: Miller syndrome, 4 cases
exome sequenced reveals causal mutations in DHODH
Lifton: Undiagnosed congenital chloride diarrhoea (consanguinous)
Exome seq reveals homozygous SLC23A chloride ion transporter mutation
Return diagnosis of CLD (gi) not suspected Bartter syndrome (renal)
Worthey, Dimmock: 4-year old, severe unusual IBD
exome seq reveals XIAP mutation (at a highly conserved aa)
proimmune disregulation opt for bone marrow transplant over chemo
Jones, Marra: Secondary lung carcinoma unresponsive to erlotinib
Genome and transcriptome sequencing reveals defects
directs alternative sunitinib therapy
Mardis, Wilson: acute myelocytic leukaemia but not classical translocation
Genome sequencing (1 week + analysis) reveals PML-RARA translocation
Directs ATRA (all trans retinoic acid) treatment decision
3. and tremendous challenges
Managing and processing vast quantities of data into variation
Interpreting millions of variants per individual
An individuals genome harbors
~80 point nonsense mutations
~100-200 frameshift mutations
Tens of splice mutants, CNV induced gene disruptions
For very few of these do we have any conclusive understanding
of their medical impact in the population
4. Successes to date rely on factors that may not apply generally to common endpoints
Mendelian disorders
Single family rare autosomal recessive (linkage may target 1% of genome, 2 hits in the same gene very unlikely)
Single (or near single) gene disorders where nearly all families carry mutations in the same underlying gene
Somatic or de novo mutations
Extremely rare background rate
5. Autism exome sequencing
In progress ARRA supported by NIMH & NHGRI
Collaboration between sequencing centers (Baylor & Broad) and Y2 follow-up in autism genetics labs (Buxbaum, Daly, Devlin, Schellenberg, Sutcliffe)
Targeted production by years end of 1000 cases and 1000 controls (500/500 from each site)
6. Exome production plan
Baylor: 1000 samples (Nimblegen capture, SOLiD sequencing)
Broad: 1000 samples (Agilent capture, Illumina sequencing)
Predominantly cases and controls pairwise matched with GWAS data (one batch of 50 trios currently being run)
All samples are available from NIMH repository
7. Broad Exome Production
~700 exomes completely sequenced and recently completed variant calling
~300 completed earlier in the Summer and fully analyzed (basis of later analysis slides)
Main production conducted with matched case-control pairs traveling together through the sequencing lab and computational runs of variant calling
8. From unmapped reads to true genetic variation in next-generation sequencing data
Mapping and alignment
Raw short reads
Region 1
Region 2
Human reference genome
The origin of each read from the human genome sequence is found
A single run of a sequencer generates ~50M ~75bp short reads for analysis
Quality calibration and annotation
Identifying genetic variation
Region 1
Region 2
Region 1
Region 2
Human reference genome
Human reference genome
The quality of each read is calibrated and additional information annotated for downstream analyses
SNPs and indels from the reference are found where the reads collectively provide evidence of a variant
9. Partnership: Genome Sequencing and Analysis (GSA) team @ Broad
Group Leader
Mark A. DePristo
Analysis Team
Kiran Garimella [Lead]
Chris Hartl
Corin Boyko
Development Team
Eric Banks [Lead]
Guillermo del Angel
Menachem Fromer
Ryan Poplin
Software Engineering
Matthew Hanna
Khalid Shakir
Aaron McKenna
Genome Sequencing and Analysis (GSA) develops core capabilities for genetic analysis
Data processing and analysis methods
Technology development
High-end software engineering
High-throughput data processing for MPG exome projects with MPG-Firehose
Staffed by full-time research scientists in MPG
PhDs and BAs in biochemistry, engineering, computer science, mathematics, and genetics
10. Developing cutting-edge data processing and analysis methods
Variation discovery and genotyping
Local realignment
Novel SNPs found
Base quality score recalibration
Adaptive error modeling
Read-backed phasing
11. Challenges
Quality score recalibration
Calling variants
Evaluating set of variant sites
Estimating genotypes
12. Region 1
Region 2
Region 3
Mapping and alignment algorithm
Reference genome
Detects correct read origin and flags them with high certainty
Detects ambiguity in the origin of reads and flags them as uncertain
Enormous pile of short reads from NGS
Solexa : BWA
454 : SSAHA
SOLiD : Corona
Robust, accurate gold standard aligner for NGS
Developed by Li and Durbin
Recently replaced MAQ, also by Li and Durbin, used for last 2 years
Finding the true origin of each read is a computationally demanding and important first step

  • Hash-based aligner with high sensitivity and specificity with longer reads

13. ABI-designed tool for aligning in color-spaceSAM/BAM files
14. The SAM file format

  • Data sharing was a major issue with the 1000 genomes

15. Each center, technology and analysis tool used its own idiosyncratic file formats no one could exchange data 16. The Sequence Alignment and Mapping (SAM) file format was designed to capture all of the critical information about NGS data in a single indexed and compressed file 17. Becoming a standard and is now used by production informatics, MPG, and cancer analysis groups at the Broad 18. Has enabled sharing of data across centers and the development of tools that work across platforms 19. More info at

Search related