Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Sheep Parentage
- a comparison of technologies
Shannon Clarke, Ken G Dodds, Rudiger Brauning, Tracey Van Stijn, Rayna Anderson, Suzanne Rowe and John McEwan
AgResearch Ltd, Invermay Agricultural Centre, Mosgiel, New Zealand
Accurate pedigree information
• critical to animal breeding and livestock production systems • rate of genetic gain; estimate breeding values accurately
• management of inbreeding
• Traditionally• pedigree information -breeder records
• Single sire matings- requires dedicated infrastructure (eg fences)
• Mothering up to construct dam pedigree- labour intensive (costly); accuracy?
• However, in mob/syndicate matings- no information about paternity in large management groups; inability to track sire merit through progeny performance
• Microsatellites (simple sequence repeats/short tandem repeats)
• Other benefits• Feed saved
• not spreading out ewes saves around $5 of feed costs by better control of feed in autumn
• also can farm animals in commercial environment
• Pregscan• technology very dependent on use of pregscanners for NLB and also for fetal aging.
• This plus management (lambing by expected lambing NLB and birth date while mixing up mating groups allows proper contemporary group formation)
• allows estimation of ram fertility/mating success, work underway but heritable and repeatable trait.
• Abundance of available genomic data + development of high throughput genotyping platforms
• SNP is DNA marker of choice for pedigree (and genomic selection studies)
• SNPs allow for standardisation between laboratories; a property that is crucial for developing an international set of markers for traceability studies.
Accurate pedigree information
SNP Panels for Parentage assignment
International Sheep Genomics Consortium (ISGC)
• Development of genomic tools (genome assemblies,
various SNP chip arrays…...FAANG)
• Majority of parentage SNP panels by various groups as
a result of initial ISGC tool development
Kijas et al
SNP Panels for Parentage assignment
Core set + country specific SNPs
263
48
35
1
2
2
80
33
CSIRO
USDA
ISGC
AgR
ISGC Core Panel 88 SNPs + 1 oY1
• Selected using population allele
frequencies
• 70 breeds
• 5 continents
• Bias to high MAFs
• use across numerous breeds
• Further information and SNP list
for download available at
www.sheephapmap.org
ISGC Illumina 15K array• Designed to enable accurate imputation to
HD chip
• a combination of the current AgResearch
~7K and Australian sheep CRC ~11K LD
chips with additional added content
• ~12K of imputation SNPs spaced
across the genome
• ~800 parentage SNPs from various
platforms
• ISGC, AgR, CSIRO, USDA, INRA
• ~2000 SNPs for enhanced imputation
SNPs in regions of interest
• Literature and production SNPs
ISGC Illumina 15K arrayMAF in NZ sheep
All SNPs Parentage SNPs
Sheep parentage- a comparison of
technologies• Many genotyping platforms
• Offer different multiplex levels, throughput, call rates
• Mass spec
• Array platforms
• Targeted sequencing
• Oligo probes
• ….
• SNP selection critical
• Cost per SNP decreasing
Cost vs Utility?
SNP parentage panel + some production/specific
trait SNPs (100’s-1000’S)
VS
SNP set that enables not only pedigree, inbreeding
but also genomic selection (10,000s-100,000s)
-cost similar to current parentage panels
Genotyping-by-sequencing
• Two “categories”-
• oligo directed re-sequencing of specific regions -
targeted
• reduced representational re-sequencing- restriction
enzyme digests- “random”
• Both use high levels of adaptor sequence
barcoding and multiplexing (48-2068…)
samples/lane
• Number of SNPs interrogated: 100’s to > 1 Million.
Aim to have a high throughput, reproducible and cost
effective GBS method.
• Tissue sample collection
• DNA extraction
• Library preparation
• Sequencing
• Analysis• Parentage assignment, inbreeding
• Genomic selection (GBLUP)
• GWAS studies
• Short processing time for results to industry
Industry application
Why “random”?
• Working with a number of species (~ method for all)• Industry:
• ruminants, shell & fin fish, forage (clover & rye-grass)
• Research:
• Fur seal, crow, tuatara, albatross, robin, weevils, bee
• Sheep (test species)
• Genome assembly
• HD SNP chip
• High throughput/quality DNA extraction
• price competitive with existing DNA based parentage assays
• all animals in the sire breeding tier are genotyped?
Why “random”?• little up-front development cost compared to array &
targeted GBS system• No-oligo design/purchase required
• Cost competitive: minor agricultural species & natural populations
• Good for species with high genetic diversity
• potential advantage: the elimination of ascertainment bias• relevant for population genetics studies- allows use of the
full range of allele frequency spectrum based analytical methods.
• High molecular weight DNA
• 260/280 > 1.8
• Consistent amount of DNA extracted (CV <20%)
1. High throughput DNA extraction from ear/fin clip tissue punch
Step 2. GBS Library preparation and purification
restriction enzyme based
Figure from Elshire et al 2011
• Utilises Elshire et al 2011 GBS method with the addition of a library
purification step utilising the Pippin Prep (SAGE Science) to further size
select DNA sequencing library.
• Accurate nano- robotic systems employed throughout.
• HiSeq 2500 V4 chemistry
• Single end reads (1x100)
Step 3. SequencingStep 4. Bioinformatic and
statistical analysisQC
+ve/-ve controls
Species
Reads/bar code
Allflex
TSUClarke et al., 2014 PLoS ONE
Average concordance observed between the HD SNP chip and the
genotypes identified by GBS and WGS (~20x).
Not enough sequencing reads to support heterozygous
genotype.
✓ Adjust genotyping method to increase sequencing
read depth
• Pippin-reduce GBS library size
• Double digests
OR
✓ Develop statistical methods for GBS data
• kinship using GBS with depth adjustment (KGD)
Genotyping-by-sequencing
Genotype sampling-by-sequencing
Allele typing-by-sequencing
Sampling…
…..ACGTACTG……
…..ACGCACTG……
T/C T/T C/C
…..ACGCACTG……
T/C
…..ACGCACTG……
T/C
…..ACGCACTG……
…..ACGCACTG……
T/T
…..ACGTACTG……
…..ACGTACTG……
…..ACGTACTG……
T/C
…..ACGTACTG……
C/C
…..ACGCACTG……
…..ACGCACTG……
T/C
…..ACGTACTG……
…..ACGCACTG……
…..ACGCACTG……
Sampling…
…..ACGTACTG……
…..ACGCACTG……
1/1 0/0 1/0
…..ACGCACTG……
1/0
…..ACGCACTG……
2/0
…..ACGCACTG……
…..ACGCACTG……
0/3
…..ACGTACTG……
…..ACGTACTG……
…..ACGTACTG……
0/1
…..ACGTACTG……
2/0
…..ACGCACTG……
…..ACGCACTG……
2/1
…..ACGTACTG……
…..ACGCACTG……
…..ACGCACTG……
T/C T/T C/C
T/C T/C T/T
T/C C/C T/C
T/C */* C/C
C/C C/C T/T
T/T C/C T/C
Actual
Allele count ☺ Traditional
• unbiased estimates of relatedness via method 1 of VanRaden (2008) adjusted
to account for sequence read depth at each individual SNP location including
SNPs with zero/missing reads KGD
• allows GBS to be applied at read depths which can be chosen to optimise the
information obtained.
• SNPs with excess heterozygosity, often due to (partial) polyploidy or other
duplications can be filtered based on a simple graphical method.
Van Raden 2008 J Dairy Sci. 91:4414-23
“Fin plot “- relationship of Hardy-Weinberg disequilibrium, MAF
& SNP depth for Atlantic salmon.
• Various filters based on this plot
• proportional closeness to
lower boundary
• cut-off on level of Hardy-
Weinberg disequilibrium.
• best results (relatedness
estimates closest to pedigree-
based values)
• remove SNPs with HW
disequilibrium below -0.05
Relatedness estimation uses:
• Parentage
• Animal & plant breeding (Genetic merit)
• Historically pedigree based (expected relatedness)
• Can now use genomic relatedness matrix (GRM)
• Genetic diversity
• E.g. PCA plots via GRM
• Population genetics
• Use GRM to estimate heritabilities without pedigrees
Pst 1 method comparisonsPstI
94/lane
PstI
376/lane
PstI/MspI PstI/MspI-Y
adapter
# SNPs 56,888 66,385 95,390 106,930
Mean co-call rate (for
sample pairs)
0.82 0.42 0.64 0.57
Min co-call rate (for
sample pairs):
0.72 0.02 0.17 0.30
Mean sample depth 7.90 1.64 3.52 2.82
Mean self-relatedness
(G5 diagonal):
1.05 1.16 1.07 1.09
Samples/lane 94 376 94 94
GBS vs SNP chip
96 samples → 384 samples/lane: Sheep
384 samples/lane
96 s
am
ple
s/lane
Relatedness-best sire match
5 sires
89 progeny
1 missing sire
96 well plate reaction
VS
384 well plate reaction
$$ reduction
Example-Dairy Sheep
• new dairy sheep flock with an East Friesian
genetic base
• phenotypic records on 3000 ewes but no pedigree
information.
• 300 ram lambs and 50 older rams available for
selection candidates – which to breed from?
→ Genotype ewes and rams, generate GRM to estimate
ram breeding values
• assess GBS by comparing GRMs from GBS and
ISGC15k SNP chip & subsequent eBVs.
Example-Dairy Sheep⚫ sampdepth ≤ 0.3
⚫ 0.3< sampdepth ≤ 1
⚫ 1< sampdepth ≤2
⚫ sampdepth>2
Genomic relationships and BVs are highly correlated
-10
-8
-6
-4
-2
0
2
4
6
8
-15 -10 -5 0 5 10 15
Example-Dairy Sheep
Breeding Values kg Milk GBS vs Chip
GBS 15k Chip
Genome Required No Yes
Development Costs Minimal ~US$3k Moderate >US$60k
Species/Population
DependentNo Yes
DNA quality and
QuantitationHigh ~4 days Moderate ~2 days
SNPs called Untargeted Fixed & reliable
Throughput 188/376 per lane 24 per chip
Cost per sample US$16/25 US$35
Number of SNPs>60,000 (20-30/Mbp)
ave. depth 3.4 reads @ 188
per lane15,000 (~5/Mbp)
Minor allele frequency Greater range Biased
Example-Dairy Sheep
• Deer, Goat, Salmon, Mussel, Forage
• Transitioning from SSR parentage to GBS
• Commercial this season
• Parentage-GRM-pedigree for BLUP
• Inbreeding
• (genomic selection)
From MS GBS and GS
Back to Sheep…
“Enhanced” parentage chip
Illumina Infinium XT
• 96-sample BeadChip
• multispecies
• ~800 Parentage SNPs from the 15K chip
• Literature / productions SNPs
• Enhanced imputation SNPs
Aim is to have all animals in breeding tier genotyped
Contact:
Summary
• KGD• produces bias free genomic relationship matrices, based on
allele read depths
• can estimate:• Breed composition
• Pedigree
• Traceability
• Inbreeding
• Co-ancestry
• Included directly in existing mixed models to estimate breeding values.
• Allows all animals in the sire breeding tier to be genotyped.
• One potential advantage: elimination of ascertainment bias which plague array and oligo-based technologies• relevant for population genetics studies: allows use of allele
frequency spectrum based analytical methods
Acknowledgments: the animal genomics team
John McEwan
Rudi Brauning
Ken Dodds
Tracey Van Stijn
Rayna Anderson
Suzanne Rowe
'Genomics for Production & Security
in a Biological Economy' C10X1306FIQ Systems – Plate to
Pasture PGP06-09020