44
NUI Maynooth 20 th April, 2012 Mouse genomic variation and its effect on phenotypes and gene regulation Thomas Keane Vertebrate Resequencing Informatics Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK

Mouse Genomes Project + RNA-Editing

Embed Size (px)

Citation preview

Page 1: Mouse Genomes Project + RNA-Editing

NUI Maynooth 20th April, 2012

Mouse genomic variation and its effect on phenotypes and gene regulation

Thomas Keane Vertebrate Resequencing Informatics

Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK

Page 2: Mouse Genomes Project + RNA-Editing

NUI Maynooth 20th April, 2012

Mouse genomic variation and its effect on phenotypes and gene regulation

 Mouse Genomes Project

 RNA-Editing

Page 3: Mouse Genomes Project + RNA-Editing

NUI Maynooth 20th April, 2012

Sequencing Technologies over past 30 years

MR Stratton et al. Nature 458, 719-724 (2009)

Page 4: Mouse Genomes Project + RNA-Editing

NUI Maynooth 20th April, 2012

Sanger total sequence (2007-2009) G

bp

Page 5: Mouse Genomes Project + RNA-Editing

NUI Maynooth 20th April, 2012

Sanger total sequence to-date G

bp

HiSeq 2000

Page 6: Mouse Genomes Project + RNA-Editing

NUI Maynooth 20th April, 2012

The Laboratory Mouse

Page 7: Mouse Genomes Project + RNA-Editing

NUI Maynooth 20th April, 2012

Mouse Genome Project (2002)

Page 8: Mouse Genomes Project + RNA-Editing

NUI Maynooth 20th April, 2012

International Knockout Mouse Consortium

Page 9: Mouse Genomes Project + RNA-Editing

NUI Maynooth 20th April, 2012

Large Outbred Crosses

Founder set of inbred strains and randomly cross

 Heterogeneous stock  Collaborative Cross

Large numbers of resulting mice  Comprehensively phenotyped  Recurrent phenotypes assessed   Identify QTL regions

 Knowing the origin of haplotype blocks

Full sequence variation of founder mice required to find potential causitive mutations

Collaborative Cross Consortium (2009) Genetics

Page 10: Mouse Genomes Project + RNA-Editing

NUI Maynooth 20th April, 2012

Mouse Genomes Project

Sequencing 18 laboratory mouse strains  Largest effort to date to sequence genomes of laboratory mouse strains

Primary goals  Deep sequencing of each strain (>25x)  Comprehensive catalog sequence variation

What sort of variation?  SNPs – single base changes (A->G etc.)   Indels – insertions or deletions of a few bases  Structural variation – larger structural differences

Illumina sequencing platform  Raw data was generated in 2009  Approx. ~1.2Tbp

Page 11: Mouse Genomes Project + RNA-Editing

NUI Maynooth 20th April, 2012

What does the data look like?

Whole-genome shotgun (WGS)

Sequence in parallel ends of millions of fragments  300-500bp in size  Read sequence of 100bp of

either end

Reference

Page 12: Mouse Genomes Project + RNA-Editing

NUI Maynooth 20th April, 2012

Variation Catalog

Keane et al (2011) Nature

Page 13: Mouse Genomes Project + RNA-Editing

NUI Maynooth 20th April, 2012

Variation Catalog

Page 14: Mouse Genomes Project + RNA-Editing

NUI Maynooth 20th April, 2012

Results of the project

Genomic variation and its effect on phenotypes

Jonathan Flint & Richard Mott Keane et al (2011) Nature

Page 15: Mouse Genomes Project + RNA-Editing

NUI Maynooth 20th April, 2012

Results of the project

Genomic variation and its effect on phenotypes

Structural variation catalog

Binnaz Yalcin, Kim Wong, Thomas Keane, Jonathan Flint Yalcin et al. (2010) Nature

Page 16: Mouse Genomes Project + RNA-Editing

NUI Maynooth 20th April, 2012

Results of the project

Genomic variation and its effect on phenotypes

Structural variation catalog

Structural variation methods

Wong, Keane, Stalker, Adams (2010) Gen Biol

SVMerge

Page 17: Mouse Genomes Project + RNA-Editing

NUI Maynooth 20th April, 2012

Results of the project

Genomic variation and its effect on phenotypes

Structural variation catalog

Structural variation methods

Novel structural variation types

Binnaz Yalcin & Kim Wong Yalcin et al. (2012) Gen Biol

Page 18: Mouse Genomes Project + RNA-Editing

NUI Maynooth 20th April, 2012

Results of the project

Genomic variation and its effect on phenotypes

Structural variation catalog

Structural variation methods

Novel structural variation types

Transposable elements

Nellaker, Keane, Wong et al., under review

Page 19: Mouse Genomes Project + RNA-Editing

NUI Maynooth 20th April, 2012

Results of the project

Genomic variation and its effect on phenotypes

Structural variation catalog

Structural variation methods

Novel structural variation types

Transposable elements

RNA-Editing…….

Page 20: Mouse Genomes Project + RNA-Editing

NUI Maynooth 20th April, 2012

Mouse genomic variation and its effect on phenotypes and gene regulation

 RNA-Editing

 Mouse Genomes Project

Page 21: Mouse Genomes Project + RNA-Editing

NUI Maynooth 20th April, 2012

RNA-Editing

Site-selective post-transcriptional alteration of double-stranded RNA

Adenosine deaminase acting on RNA (ADAR) family of enzymes  Adenosine residues to inosines  Observe A-to-G SNPs in cDNA

ADARs  Bind to double-stranded regions of RNA  Modify multiple neighbouring adenosines

Apobec-1 mediated C-to-U RNA editing

Novel source of protein isoform diversity  HTR2C gene: five edit sites lead to 28 mRNAs Wulff and Nishikura (2009) WIREs RNA

Page 22: Mouse Genomes Project + RNA-Editing

NUI Maynooth 20th April, 2012

HTR2C gene

Wahlstedt et al (2009) Gen Res

Page 23: Mouse Genomes Project + RNA-Editing

NUI Maynooth 20th April, 2012

RNA-Seq

Isolate RNA and reverse transcribe to cDNA  Fragment cDNA and directly sequence  No reference bias and huge dynamic range

Uses  Gene expression analysis  Transcript discovery and annotation new genomes  Alternative splicing

RNA-editing  Align the RNA-seq reads to the reference genome   If the bases disagree with the genomic sequence data at the

corresponding position…..

McIntyre et al (2011) BMC Gen

Page 24: Mouse Genomes Project + RNA-Editing

NUI Maynooth 20th April, 2012

RNA-Editing?

RNA-seq Replicate 1

RNA-seq Replicate 2

DNA

Page 25: Mouse Genomes Project + RNA-Editing

NUI Maynooth 20th April, 2012

Human RNA-Editing

Li et al. (2011) Science

Page 26: Mouse Genomes Project + RNA-Editing

NUI Maynooth 20th April, 2012

Human RNA-Editing

Li et al. (2011) Science

Page 27: Mouse Genomes Project + RNA-Editing

NUI Maynooth 20th April, 2012

RNA-Seq is not the same as genomic sequencing

Alignment of RNA-Seq reads is not trivial  Most genomic short read aligners are not splice aware

RNA-seq Replicate 1

RNA-seq Replicate 2

DNA

Page 28: Mouse Genomes Project + RNA-Editing

NUI Maynooth 20th April, 2012

RNA-Seq is not the same as genomic sequencing

What about processed pseudo-genes?

cDNA fragment

Exon 1 Exon 2

Pseudogene

Exon 1 Exon 2

Functioning gene

Pink et al (2011) RNA

Page 29: Mouse Genomes Project + RNA-Editing

NUI Maynooth 20th April, 2012

What about in mouse?

Mouse Genomes Project  RNA-Seq of 15 mouse strains  Whole-brain tissue   2-4 biological replicates per strain   ~5Gbp per replicate

Previous catalogs  Neeman et al.   Zaranek et al. - several tens of gigabases of human and mouse cDNA

sequence  Rosenberg et al. - RNA-seq for C57BL/6J strain

Hindered by lack of corresponding genomic sequencing

We generated  Deep whole genome sequencing  Corresponding RNA-Seq from whole-brain tissue across 15 strains

 2-4 biological replicates

Page 30: Mouse Genomes Project + RNA-Editing

NUI Maynooth 20th April, 2012

Our Pipeline

gDNA SNVs

cDNA SNVs

304,817 candidate sites 98,061 unambiguous sites

Splice-aware realignment

Filtering

Minimum Depth 10x 31,923 sites

Replicate Consistency 62,889 sites

End Distance Bias 59,775 sites

Strand Bias 42,238 sites

Variant Distance Bias 36,213 sites

5,579 filtered sites

Cluster extension One-type mismatch clusters added

No assumptions about the nature of editing made

Assumed editing by ADARs which usually occurs in clusters

7,133 sites 7,389 final sites

Estimated FDR 2.9%

Page 31: Mouse Genomes Project + RNA-Editing

NUI Maynooth 20th April, 2012

Effect of Filtering Strategy

Page 32: Mouse Genomes Project + RNA-Editing

NUI Maynooth 20th April, 2012

Validation

Sequenom validation  Random set of 611 calls from both the filtered set of 5,579 RNA editing

sites  19 non A-to-G editing sites raw calls -> all confirmed false positives  Discrepancy rate of 10.5%

 Enriched at positions where editing level is <20%

T-to-C editing  Novel form of RNA-editing?

 Uncertainties in strand assignment of transcripts  Result of calls made in antisense transcripts, mis-annotations

Assuming all non A-to-G edits are false  False-discovery rate of our call set is 2.9%

Page 33: Mouse Genomes Project + RNA-Editing

NUI Maynooth 20th April, 2012

Striking Conservation

Page 34: Mouse Genomes Project + RNA-Editing

NUI Maynooth 20th April, 2012

Editing Levels

Page 35: Mouse Genomes Project + RNA-Editing

NUI Maynooth 20th April, 2012

Genomic Context

Page 36: Mouse Genomes Project + RNA-Editing

NUI Maynooth 20th April, 2012

Protein Coding Edits

23 previously known non-synonymous coding edits

Extended this by a further 30 sites  24 were by Sanger sequencing of cDNA

Cacna1d gene  Encodes the Cav1.2 voltage-

gated calcium channel   Known to undergo extensive

alternative splicing  Two novel non-synonymous

edits  Capillary sequencing validation

  Observed 3 different transcripts

Page 37: Mouse Genomes Project + RNA-Editing

NUI Maynooth 20th April, 2012

Cacna1d

Page 38: Mouse Genomes Project + RNA-Editing

NUI Maynooth 20th April, 2012

Rare C-to-U Edit: Mfn1

Page 39: Mouse Genomes Project + RNA-Editing

NUI Maynooth 20th April, 2012

Rare C-to-U Edit: Mfn1

Page 40: Mouse Genomes Project + RNA-Editing

NUI Maynooth 20th April, 2012

Cds2 - UTR

RNA-editing appears to revert genomic sequence back to ancestral state

Mice homozygous for disruptions in this gene display a lethal phenotype

Several known across-species examples   RNA-editing maintaining conservation at the protein level despite genomic sequence divergence

D a a a a a a a a a a a a a a a g g g

R g g g g g g g g g g g g g g g g g g

Rat g g

Page 41: Mouse Genomes Project + RNA-Editing

NUI Maynooth 20th April, 2012

Human Follow-up Studies

Ramaswami et al. (2012) Nat Meth

Bahn et al (2011) Gen Res Peng et al (2011) Nat Bio

Li et al (2011) Science

Page 42: Mouse Genomes Project + RNA-Editing

NUI Maynooth 20th April, 2012

To do

First phase of the project was cataloging variation

Full denovo assemblies of the strains  Generating higher quality sequencing data for the 18 strains  Long fragment end sequencing – 3, 6, 10, 40kb fragments

De novo assembly  Discover novel haplotypes  Novel gene structures in the divergent strains

Mouse pan-genome  Reference bias  New mouse reference genome graph

 Including novel non-reference haplotypes shared amongst subsets of the strains

Page 43: Mouse Genomes Project + RNA-Editing

NUI Maynooth 20th April, 2012

Acknowledgements and Questions

David Adams

Mouse Genomes Project  Sanger Insitute

 David Adams, Petr Danecek, Kim Wong, Guy Slater, Sendu Bala et al.

 Wellcome Trust Center for Human Genetics  Jonathan Flint, Binnaz Yalcin, Richard Mott, Leo Goodstadt et al.

 EBI  Ewan Birney

 University of Oxford  Chris Ponting, Chris Nellaker, Andres Heger, Grant Belgard

RNA-Editing   Petr Danecek, David Adams, Chris Nellaker

Jonathan Flint

Email: [email protected]

Page 44: Mouse Genomes Project + RNA-Editing

NUI Maynooth 20th April, 2012

WTSI PhD Programme