View
1.227
Download
1
Category
Tags:
Preview:
Citation preview
NUI Maynooth 20th April, 2012
Mouse genomic variation and its effect on phenotypes and gene regulation
Thomas Keane Vertebrate Resequencing Informatics
Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK
NUI Maynooth 20th April, 2012
Mouse genomic variation and its effect on phenotypes and gene regulation
Mouse Genomes Project
RNA-Editing
NUI Maynooth 20th April, 2012
Sequencing Technologies over past 30 years
MR Stratton et al. Nature 458, 719-724 (2009)
NUI Maynooth 20th April, 2012
Sanger total sequence (2007-2009) G
bp
NUI Maynooth 20th April, 2012
Sanger total sequence to-date G
bp
HiSeq 2000
NUI Maynooth 20th April, 2012
The Laboratory Mouse
NUI Maynooth 20th April, 2012
Mouse Genome Project (2002)
NUI Maynooth 20th April, 2012
International Knockout Mouse Consortium
NUI Maynooth 20th April, 2012
Large Outbred Crosses
Founder set of inbred strains and randomly cross
Heterogeneous stock Collaborative Cross
Large numbers of resulting mice Comprehensively phenotyped Recurrent phenotypes assessed Identify QTL regions
Knowing the origin of haplotype blocks
Full sequence variation of founder mice required to find potential causitive mutations
Collaborative Cross Consortium (2009) Genetics
NUI Maynooth 20th April, 2012
Mouse Genomes Project
Sequencing 18 laboratory mouse strains Largest effort to date to sequence genomes of laboratory mouse strains
Primary goals Deep sequencing of each strain (>25x) Comprehensive catalog sequence variation
What sort of variation? SNPs – single base changes (A->G etc.) Indels – insertions or deletions of a few bases Structural variation – larger structural differences
Illumina sequencing platform Raw data was generated in 2009 Approx. ~1.2Tbp
NUI Maynooth 20th April, 2012
What does the data look like?
Whole-genome shotgun (WGS)
Sequence in parallel ends of millions of fragments 300-500bp in size Read sequence of 100bp of
either end
Reference
NUI Maynooth 20th April, 2012
Variation Catalog
Keane et al (2011) Nature
NUI Maynooth 20th April, 2012
Variation Catalog
NUI Maynooth 20th April, 2012
Results of the project
Genomic variation and its effect on phenotypes
Jonathan Flint & Richard Mott Keane et al (2011) Nature
NUI Maynooth 20th April, 2012
Results of the project
Genomic variation and its effect on phenotypes
Structural variation catalog
Binnaz Yalcin, Kim Wong, Thomas Keane, Jonathan Flint Yalcin et al. (2010) Nature
NUI Maynooth 20th April, 2012
Results of the project
Genomic variation and its effect on phenotypes
Structural variation catalog
Structural variation methods
Wong, Keane, Stalker, Adams (2010) Gen Biol
SVMerge
NUI Maynooth 20th April, 2012
Results of the project
Genomic variation and its effect on phenotypes
Structural variation catalog
Structural variation methods
Novel structural variation types
Binnaz Yalcin & Kim Wong Yalcin et al. (2012) Gen Biol
NUI Maynooth 20th April, 2012
Results of the project
Genomic variation and its effect on phenotypes
Structural variation catalog
Structural variation methods
Novel structural variation types
Transposable elements
Nellaker, Keane, Wong et al., under review
NUI Maynooth 20th April, 2012
Results of the project
Genomic variation and its effect on phenotypes
Structural variation catalog
Structural variation methods
Novel structural variation types
Transposable elements
RNA-Editing…….
NUI Maynooth 20th April, 2012
Mouse genomic variation and its effect on phenotypes and gene regulation
RNA-Editing
Mouse Genomes Project
NUI Maynooth 20th April, 2012
RNA-Editing
Site-selective post-transcriptional alteration of double-stranded RNA
Adenosine deaminase acting on RNA (ADAR) family of enzymes Adenosine residues to inosines Observe A-to-G SNPs in cDNA
ADARs Bind to double-stranded regions of RNA Modify multiple neighbouring adenosines
Apobec-1 mediated C-to-U RNA editing
Novel source of protein isoform diversity HTR2C gene: five edit sites lead to 28 mRNAs Wulff and Nishikura (2009) WIREs RNA
NUI Maynooth 20th April, 2012
HTR2C gene
Wahlstedt et al (2009) Gen Res
NUI Maynooth 20th April, 2012
RNA-Seq
Isolate RNA and reverse transcribe to cDNA Fragment cDNA and directly sequence No reference bias and huge dynamic range
Uses Gene expression analysis Transcript discovery and annotation new genomes Alternative splicing
RNA-editing Align the RNA-seq reads to the reference genome If the bases disagree with the genomic sequence data at the
corresponding position…..
McIntyre et al (2011) BMC Gen
NUI Maynooth 20th April, 2012
RNA-Editing?
RNA-seq Replicate 1
RNA-seq Replicate 2
DNA
NUI Maynooth 20th April, 2012
Human RNA-Editing
Li et al. (2011) Science
NUI Maynooth 20th April, 2012
Human RNA-Editing
Li et al. (2011) Science
NUI Maynooth 20th April, 2012
RNA-Seq is not the same as genomic sequencing
Alignment of RNA-Seq reads is not trivial Most genomic short read aligners are not splice aware
RNA-seq Replicate 1
RNA-seq Replicate 2
DNA
NUI Maynooth 20th April, 2012
RNA-Seq is not the same as genomic sequencing
What about processed pseudo-genes?
cDNA fragment
Exon 1 Exon 2
Pseudogene
Exon 1 Exon 2
Functioning gene
Pink et al (2011) RNA
NUI Maynooth 20th April, 2012
What about in mouse?
Mouse Genomes Project RNA-Seq of 15 mouse strains Whole-brain tissue 2-4 biological replicates per strain ~5Gbp per replicate
Previous catalogs Neeman et al. Zaranek et al. - several tens of gigabases of human and mouse cDNA
sequence Rosenberg et al. - RNA-seq for C57BL/6J strain
Hindered by lack of corresponding genomic sequencing
We generated Deep whole genome sequencing Corresponding RNA-Seq from whole-brain tissue across 15 strains
2-4 biological replicates
NUI Maynooth 20th April, 2012
Our Pipeline
gDNA SNVs
cDNA SNVs
304,817 candidate sites 98,061 unambiguous sites
Splice-aware realignment
Filtering
Minimum Depth 10x 31,923 sites
Replicate Consistency 62,889 sites
End Distance Bias 59,775 sites
Strand Bias 42,238 sites
Variant Distance Bias 36,213 sites
5,579 filtered sites
Cluster extension One-type mismatch clusters added
No assumptions about the nature of editing made
Assumed editing by ADARs which usually occurs in clusters
7,133 sites 7,389 final sites
Estimated FDR 2.9%
NUI Maynooth 20th April, 2012
Effect of Filtering Strategy
NUI Maynooth 20th April, 2012
Validation
Sequenom validation Random set of 611 calls from both the filtered set of 5,579 RNA editing
sites 19 non A-to-G editing sites raw calls -> all confirmed false positives Discrepancy rate of 10.5%
Enriched at positions where editing level is <20%
T-to-C editing Novel form of RNA-editing?
Uncertainties in strand assignment of transcripts Result of calls made in antisense transcripts, mis-annotations
Assuming all non A-to-G edits are false False-discovery rate of our call set is 2.9%
NUI Maynooth 20th April, 2012
Striking Conservation
NUI Maynooth 20th April, 2012
Editing Levels
NUI Maynooth 20th April, 2012
Genomic Context
NUI Maynooth 20th April, 2012
Protein Coding Edits
23 previously known non-synonymous coding edits
Extended this by a further 30 sites 24 were by Sanger sequencing of cDNA
Cacna1d gene Encodes the Cav1.2 voltage-
gated calcium channel Known to undergo extensive
alternative splicing Two novel non-synonymous
edits Capillary sequencing validation
Observed 3 different transcripts
NUI Maynooth 20th April, 2012
Cacna1d
NUI Maynooth 20th April, 2012
Rare C-to-U Edit: Mfn1
NUI Maynooth 20th April, 2012
Rare C-to-U Edit: Mfn1
NUI Maynooth 20th April, 2012
Cds2 - UTR
RNA-editing appears to revert genomic sequence back to ancestral state
Mice homozygous for disruptions in this gene display a lethal phenotype
Several known across-species examples RNA-editing maintaining conservation at the protein level despite genomic sequence divergence
D a a a a a a a a a a a a a a a g g g
R g g g g g g g g g g g g g g g g g g
Rat g g
NUI Maynooth 20th April, 2012
Human Follow-up Studies
Ramaswami et al. (2012) Nat Meth
Bahn et al (2011) Gen Res Peng et al (2011) Nat Bio
Li et al (2011) Science
NUI Maynooth 20th April, 2012
To do
First phase of the project was cataloging variation
Full denovo assemblies of the strains Generating higher quality sequencing data for the 18 strains Long fragment end sequencing – 3, 6, 10, 40kb fragments
De novo assembly Discover novel haplotypes Novel gene structures in the divergent strains
Mouse pan-genome Reference bias New mouse reference genome graph
Including novel non-reference haplotypes shared amongst subsets of the strains
NUI Maynooth 20th April, 2012
Acknowledgements and Questions
David Adams
Mouse Genomes Project Sanger Insitute
David Adams, Petr Danecek, Kim Wong, Guy Slater, Sendu Bala et al.
Wellcome Trust Center for Human Genetics Jonathan Flint, Binnaz Yalcin, Richard Mott, Leo Goodstadt et al.
EBI Ewan Birney
University of Oxford Chris Ponting, Chris Nellaker, Andres Heger, Grant Belgard
RNA-Editing Petr Danecek, David Adams, Chris Nellaker
Jonathan Flint
Email: thomas.keane@sanger.ac.uk
NUI Maynooth 20th April, 2012
WTSI PhD Programme
Recommended