Upload
raghav-nr
View
1.069
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Whole Genome Selection
Citation preview
Whole Genome Selection ;Theoretical Consideration
Raghavendra N.RPh.D ScholarPlant Breeding & Genetics
Presentation Overview
1. Introduction (GS)2. Why Genomic Selection ?3. Steps involved in GS4. Factors contribute to success of GS 5. Future directions of GS 6. Conclusion
Selection in GS is usually based on Genomic estimated of breeding values.
Method of Selection ; Where we come from..??
ΔG=Accuracy of selection X intensity of selection X genetic standard deviation
Generation interval
Selections can take place in laboratory
Selection was played important role in Human-plant co evolution
Genetic gain /GA;
Genotype
4
Time
Find population
Find markers
Phenotype
QTL/gene
selection
Genetic gain
Breeder
GS
Genomic informatio
n
Method of Selection ; Where we come from..??
Traits with low heritability
Traits that are expressed late in individual’s life
Traits that can not be measured easily (ex: disease resistance &
quality traits)
Time consuming and the rate of breeding is slow
Limitations
Traditional Selection
PS
“Picking the low hanging fruit”
The genes with big QTL effects
The major success is only achieved with the qualitative traits
Limitations of MAS
The biparental mapping populations used in most QTL studies do not readily translate to breeding applications
The term ‘GS’ was first introduced by Haley and Visscher at the 6th World Congress on Genetics Applied to Livestock Production at Armidale, Australia in 1998.
GS was first propounded by Meuwissen et al (2001) : Seminal paper ‘Meuwissen et al (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157: 1819-29.”
Dr. Theo Meuwissen
Whole Genome Selection
Genomic Selection; an emerging breeding methodology designed to exploit high-throughput, inexpensive DNA marker information to accurately predict the genetic value of breeding candidates for complex traits.
EBV; An estimate of the additive genetic merit for a particular trait that an individual will pass on to its descendant's.”
GEBVs; Prediction of the genetic merit of an individual based on its genome.
Genome Selection
Trace all segments of the genome with markers-Capture all QTL = all genetic variance
Predict genomic breeding values as sum of effects over all segments
Genomic selection exploits LD.
Genomic selection avoids bias in estimation of effects due to multiple testing, as all effects fitted simultaneously.
How to estimate Breeding value?
X
10 litre
12 litre8 litre 10 litre0.5 litre
What is the Breeding value of this cow for milk production?
Breeding value =h2(milk production-average)
= (12-7.625)*h2 = 4.35 litres
Traditional selection
GS MAS
MAS concentrates on a small number of QTLs that are tagged by markers with well verified associations.
GS : genome-wide panel of dense markers so that all QTls are in LD with at least one marker
120 cms
15cms
Major and minor QTLs
Nakaya et al 2012
Why Genomic selection important to turn on now..??
Relatively slow progress via phenotypic selection
Large cost of phenotyping Limited throughput (plot area, time, people) QTs + small effects Decreasing cost of genotyping Promising results from simulation and cross
validation of GS.
Meet the challenge of feeding 9.5 billion @ 2050.
Pre-requisite for the introduction of GS
The need for adequate and affordable genotyping platforms.
Relatively simple breeding schemes in which selection of additive genetic effects will generate useful results.
Statistical methods.
1998
20002001
2002
2004
2005
Year
Genomic Information
Genomic information
GW-MI
Cont..
How can we do that..?
Inexpensive, high-density genotypes
Accurate phenotypes
Heffner et al (2009)
Prerequisite Training Population (genotypes + phenotypes)Selection Candidates (genotypes)
Crops are Concerned
Biparental vs. Multi-Family
Biparental1. Population specific2. Reduced epistasis3. Reduced number of markers required4. Smaller training populations required5. Balanced allele frequencies6. Best for introgression of exotic
Multi-Family1. Allows prediction across a broader
range of adapted germplasm2. Allows sampling of more E3. Cycle duration is reduced because
retraining model is on-going.4. Allows larger training populations5. Greater genetic diversity
Training Population
Genomic Selection
Cardinal points for success of GS
1. Population type & size of training population
2. Genotyping Platforms & marker densities.
3. Availability of HD genome wide markers.
4. Appropriate statistical methods for accurate GEBVs.
5. Epistasis & G x E.
6. Linkage disequilibrium
7. Long term selection
SNP
DArT
GBS
Marker types & Marker density
SNP chip in Genomic selection
Single markers (gene) predict in very small differences.
Abundant in nature. 1kb-2SNP.
Predicting differences in BVs.
What sequences we can call as haplotypes?
The similar haplotypes will make haplotype block where there will be high LD and less recombination's.
Is GBS a suitable marker platform for genomic selection?
Obviously ..!!!
GBSElshire et al (2010)
GBS accesses regulatory regions and sequence tag mapping.
Flexibility and low cost.
GBS markers led to higher genomic prediction accuracies.
Impute missing data.
Highly multiplexed
Even for a species with a genome as challenging as wheat (Absence of a reference genome)
Poland et al (2011)
i. RF
ii. MVN EM
Statistical model used
GBS markers are more uniformly distributed across the genome than the DArT markers
Number and size of QTLs.
LD between marker and QTLs.
Marker density, marker type, and training population size.
Number of lines increases (accuracy GEBVs ↑)
GS Prediction Accuracies
Heritability of the trait.
Genetic structure of the trait.
Simulation study results.
Cross-validation; How close is the simulated data to real data?
GS Prediction Accuracies
i. Stepwise regression (SR)
ii. Ridge regression BLUP (RR BLUP)
iii. Bayesian regression (BR)
Genomic selection prediction models
Meuwissen et al (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157: 1819-29.
Select most significant markers on the basis of arbitrary significant thresholds and non significant markers effect equals to zero. (Lande and Thompson, 1990)
Estimate the effect of significant markers using multiple regression Since, only a portion of the genetic variance will be captured.
Limitations :
Detects only large effects and that cause overestimation of significant effects. (Goddard and Hayes, 2007; Beavis, 1998 )
SR resulted in low GEBVs accuracy due to limited detection of QTLs. (Meuwissen et al 2001)
Stepwise Regression (SR)
Ridge Regression BLUP (RR-BLUP)
Simultaneously select all marker effects rather than categorizing into significant or having no effect
Ridge regression shrinks all marker effects towards zero.
The method makes the assumption that markers are random effects with a equal variance. (Meuwissen et al 2001)
Limitations :
RR-BLUP incorrectly treats all effects equally which is unrealistic.
(Xu et al 2003)RR-BLUP Superior to SR
Bayesian Regression (BR)
Marker variance treated more realistically by assuming specified prior distribution.
BayesA: uses an inverted chi-square to regress the marker variance towards zero.
All marker effects are > 0 (Bayes A)
BayesB: assume a prior mass at zero, thereby allowing for markers with no effects.
Some marker effects can be = 0 (Bayes B) (Meuwissen et al
2001)
Other potential Genomic selection prediction models
i. Least absolute shrinkage and selection operator (LASSO)
ii. Reproducing Kernel Hilbert spaces and support vector machine regression. (RKHS) Gianola et al (2006)
iii. Partial Least Squares regression & principle component regression.
iv. RF (R package random forest)
v. MVN EM Algorithm
R-Package for GS http://www.r-project.org
A genome of 1000 cM was simulated with a marker spacing of 1 cM
Modeling epistasis and dominance
Accurate prediction of dominance and epistatic effects fetch advantageous.
Lorenza et al pointed out inclusion of epistatic effects in prediction models will give improve accuracy with condition as;
Epistasis is present & can be modelled accurately.
Blanc et al (2006) reported that epistasis will contribute to marker effects.
Empirical studies harnessing data are illuminating for this topic.
GS in relation to strong subpopulation structure
GWAS studies, SPS potentially cause spurious long distance / unlinked association b/w marker allele & phenotype.
GS, shifts to being able to maintain predictive ability despite a structure training data set & spurious association will not be an important cause for loss of predictive ability.
LD is not consistent, allelic effects estimated in one subpopulation will not be predictive for another subpopulation.
Long-term selection
Improving gain in the long-term necessarily requires a trade-off with short-term gain.
Long-term gain is often explicit, as in quantitative genetic models that maximize immediate predicted gain subject to a constraint on the rate of inbreeding. Meuwissen (1997).
Two approaches:1. Select individuals or groups2. Analytical prediction, deterministic simulation using
Numerical approaches to optimization, and stochastic simulation
Some issues :
Has proved its value in animal breeding particularly dairy cattle(Hayes and Goddard, 2010)
Still to prove its value over generations in crop plants
Simulation studies in plants suggest potential for improved gain per unit time. (Jannink et al 2010)
Genomic selection
Future Directions..???
GS has been seldom implemented in the field
Where to apply GS in the breeding cycle (which generations)
How many lines to select for genotyping.
Where and how do we place our training population in comparison to the selection candidates?
How many markers are required, determined by the extent of LD.
How can we implement non additive effects into our models to allow predictions across multiple generations?
How do non-additive effects affect the accuracy of genomic selection.
How often to re-estimate the chromosome segment effects?
Future Directions..???
Outstanding questions that remain unanswered..??
How much gain do we expect when using GS? how much potential loss ??
can a breeding program absorb?
GS future perspectives
Training population design.
Epistatic modelling in GS.
Strength of different statistical methods.
Managing short & long term gain.
Does genomic selection have a future in plant breeding..?
Lorenz LabDepartment of Agronomy & Horticulture
University of Nebraska-Lincolnhttp://www.lorenzlab.net
Rex BernardoDepartment of Agronomy and Plant Genetics
University of Minnesota
Further Interest..??
Visit….
Ongoing projects on GS
Crop Trait Markers FUNDING AGENCY
PROJECT DURATION
Tomato Quality, shape, shelf life
SNPUSDA/AFRI
2009-2013
Barley FHB resistance SNP Univ. of Minnesota 2013
Trifolium Yield SNP Danish plant research and for
Aarhus University
2010-2015
Wheat Winter wheat genotype-by-sequencing
Wheat Breeding Presidential Chair
2014
Maize Drought SNP CIMMYT 2014
Maize Total biomass yield and silage
quality
SNP USDA-AFRI 2014
Sugar beet White sugar yield, sugar
content
SNP State Plant Breeding Institute, University
of Hohenheim
2013
Conclusion
“Nothing In Science Has Any Value To Society If It Is Not Communicated”-Anne
Roe