Upload
catori
View
37
Download
0
Embed Size (px)
DESCRIPTION
Use of NGS to identify the causal variant associated with a complex phenotype. Overview. Why are we sequencing? How did we select the animals to sequence? What are the steps involved in the process? What do you do with the reads once you have them? Where are we now?. Introduction. - PowerPoint PPT Presentation
Citation preview
J. B. Cole
Animal Improvement Programs LaboratoryAgricultural Research Service, USDABeltsville, MD 20705-2350, USA
Use of NGS to identify the causal variant associated with a complex phenotype
Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (2) Cole
Overview
Why are we sequencing?
How did we select the animals to sequence?
What are the steps involved in the process?
What do you do with the reads once you have them?
Where are we now?
Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (3) Cole
Introduction
Several studies (Kuhn et al., 2003; Cole et al., 2007; Seidenspinner et al., 2009) have reported QTL on BTA 18 associated with dystocia
Bioinformatic analysis using SNP data has not identified the causal variant
Next generation sequencing (NGS) has recently been used to find causal variants for novel recessive disorders
Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (4) Cole
Chromosome 18 is different
Markers on chromosome 18 have large effects on several traits: Dystocia and stillbirth: Sire and
daughter calving ease and sire stillbirth
Conformation: rump width, stature, strength, and body depth
Efficiency: longevity and net merit
Large calves contribute to reduced lifetimes and decreased profitability
Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (5) Cole
Marker effects for dystocia complex AR-BFGL-NGS-109285
Cole et al., 2009 (J. Dairy Sci. 92:2931–2946)
ARS-BFGL-NGS-109285
Cole et al., 2009 (J. Dairy Sci. 92:2931–2946)
Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (6) Cole
Correlations in dystocia complex
Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (7) Cole
The QTL also affects gestation length
Maltecca et al. 2011. Animal Genetics, 42:6, 585-591.
Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (8) Cole
Overview of the dystocia complex
The key marker is ARS-BFGL-NGS-109285 at (rs109478645 ) 57,585,121 Mb on BTA18
Intronic to SIGLEC12 (sialic acid binding Ig-like lectin 12)
Recent results indicate effects on gestation length (Maltecca et al., 2011) and calf birth weight (Cole et al., unpublished data)
Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (9) Cole
This is a gene-rich region
http://useast.ensembl.org/Bos_taurus/Location/View?r=18%3A57583000-57587000
http://www.ncbi.nlm.nih.gov/gene?cmd=Retrieve&dopt=Graphics&list_uids=618463
Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (10) Cole
Copy number variants are present
ARS-BFGL-NGS-109285 is flanked by CNV
There’s a loss and a gain to the left (8 SNP region)
There’s a gain to the right (10 SNP region)
This can result in assembly problems
Hou et al. 2011. Genomic characteristics of cattle copy number variations. BMC Genomics. 12:127.http://www.biomedcentral.com/1471-2164/12/127
Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (11) Cole
Where did this problem come from?
http://aipl.arsusda.gov/CF-queries/Bull_Chromosomal_EBV/bull_chromosomal_ebv.cfm?
40,803 daughters
Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (12) Cole
What if we look at a different trait?
Cole et al. (2007) proposed the following mechanism:
SIGLEC12 may sequester circulating leptin
This increases gestation length
Calf birth weight (BW) is higher because of increased gestation length
Higher BW is associated with dystocia
Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (13) Cole
We don’t have birth weight data
Birth weights are not routinely recorded in the US
Collaborated with Hermann Swalve’s group to develop a selection index prediction of BW PTA
Performed GWAS and gene set enrichment analysis to search for interesting associations
Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (14) Cole
GWAS for birth weight PTA
h
Cole et al.(2013), unpublished data
Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (15) Cole
Are we measuring anything new?
Identified a SNP intronic to LHX4, which is associated with cow body weight and length (Ren et al., 2010, Mol. Bio. Reprod., 37:417-422).
4 SNP in the QTL region on BTA 18 had large effects
Several other SNP with large effects intronic or adjacent to genes with unknown functions
Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (16) Cole
KEGG pathways for birth weightWhat does regulation of the actin cytoskeleton have to do with birth weight in cattle?
That is, do these results make sense?
Maybe…these pathways may be involved in establishment & maintenance of pregnancy, as well as coordination of growth and development.
Cole et al.(2013), unpublished data
Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (17) Cole
Sequencing is becoming very affordable
Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (18) Cole
Sequencing successes at AIPL/BFGL
Simple loss-of-function mutations
APAF1 – Spontaneous abortions in Holstein cattle (Adams et al., 2012)
CWC15 – Early embryonic death in Jersey cattle (Sonstegard et al., 2013)
Weaver syndrome – Neurological degeneration and death in Brown Swiss cattle (McClure et al., 2013)
Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (19) Cole
Original pedigree-based design
Bull A (1968)AA, SCE: 8
Bull B (1962)AA, SCE: 7
MGS
Bull H (1989)Aa, SCE: 14
Bull I (1994)Aa, SCE: 18
Bull E (1982)Aa, SCE: 8
Bull F (1987)Aa, SCE: 15
Bull C (1975)AA, SCE: 8δ = 10Bull D (1968)
??, SCE: 7
MGS
Bull E (1974)Aa, SCE: 10
MGS
Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (20) Cole
Modified pedigree & haplotype design
Bull A (1968)AA, SCE: 8
Bull B (1962)AA, SCE: 7
MGS
Bull H (1989)Aa, SCE: 14
Bull I (1994)Aa, SCE: 18
Bull E (1982)Aa, SCE: 8
Bull F (1987)Aa, SCE: 15
Bull C (1975)AA, SCE: 8δ = 10 Bull E (1974)
Aa, SCE: 10
MGS
Bull J (2002)Aa, SCE: 6
Bull K (2002)Aa, SCE: 15
Bull J (2002)aa, SCE: 15
These bulls carrythe haplotype withthe largest, negativeeffect on SCE:
Bull D (1968)??, SCE: 7
Couldn’t obtain DNA:
Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (21) ColeDNA Quality Control
Molecular prep
Sample Collection
DNA Extraction
Library Construction
Library Quality Control
Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (22) Cole
Sample preparation time is substantial
DNA Extraction: ~12 hours (30 mins)
DNA QC: ~1-2 hours (1-2 hours)
Library Construction: 48 hours (12 hours)
Library QC: ~2-4 hours (1 hour)
Total: 3-4 days (15.5 hours)
Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (23) Cole
DNA quality
Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (24) Cole
Library quality
Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (25) Cole
Sequencing stage
• Illumina cBot:• Preps DNA for sequencing• Takes 4-5 hours• Must be done 48 hours before
• Illumina HiSeq 2000:• Does the sequencing• Takes ~10-14 days for 100 x 100• Minimal hands-on time
Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (26) Cole
Anatomy of a flow cell
8 lanes per flow cell
3 columns per lane
− 96 tiles per column
Each tile imaged 8 times
1 from upper surface, 1 from lower
Approximately 300Gb of sequence per flow cell
http://www.qbi.uq.edu.au/images/genomics/genomics1.jpg
Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (27) Cole
Sequencing by synthesis
https://www.broadinstitute.org/files/shared/illuminavids/sequencingSlides.pdf
Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (28) Cole
How many scientists does it take…
Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (29) Cole
Flowcell 1: Cluster densitiesCluster densities from current HiSeq run finished 30 April 2013 (unpublished data):
Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (30) Cole
Flowcell 2: Cluster densitiesCluster densities from current HiSeq run started 22 May 2013 (unpublished data):
Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (31) Cole
The Aftermath Total Time (sample to sequence):
3 weeks
That’s assuming nothing went wrong!
More realistic: months
Resulting Data
Large text files
~300 gigabytes compressed
Analysis
Often underestimated
Can take months as well
Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (32) Cole
Variant detection
• Alignment against a reference genome
• Analysis is very disk I/O-intensive.
Variant DetectionRaw Sequencer Output
Alignment to the Genome
Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (33) Cole
Computational Logistics Desktop computers
Viable for single lanes
Long computation time
Servers are better
>100GB RAM and >16 processorcores
Cloud
Amazon Web Services
iAnimal/iPlant
Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (34) Cole
Storage considerations
What to save?
Raw data?
Processed results?
How much workspace?
Suggestions:
Workspace 10x compressed files
Save alignments
Backup REGULARLY!!!
Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (35) Cole
Why should you use a pipeline?
• Automates analysis• Maximizes resource consumption• Because post-docs aren’t cheap
Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (36) Cole
Galaxy server
NextGene
Custom pipeline
Scripting languages
Open-source tools
Many options for analysis pipelines
Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (37) Cole
Challenges
Annotation
This is a mess in the cow
The reference assembly may not be representative of all taurine cows
Validation
Doing functional genomics with large mammals is expensive – who pays?
When have we proven something?
Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (38) Cole
Conclusions
Sequencing is powerful, but presents many challenges
Computational requirements are substantial
We’re learning how much we don’t know about functional genomics in the cow
Validation remains a problem
Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (39) Cole
Acknowledgments
AIPL: Derek Bickhart, Dan Null, Paul VanRaden
BFGL: Reuben Anderson, Steve Schroeder, Tad Sonstegard, Curt Van Tassell
Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (40) Cole
Questions?
http://gigaom.com/2012/05/31/t-mobile-pits-its-math-against-verizons-the-loser-common-sense/shutterstock_76826245/