Whole Genome Selection

Whole Genome Selection ;Theoretical Consideration

Raghavendra N.RPh.D ScholarPlant Breeding & Genetics

Presentation Overview

1. Introduction (GS)2. Why Genomic Selection ?3. Steps involved in GS4. Factors contribute to success of GS 5. Future directions of GS 6. Conclusion

Selection in GS is usually based on Genomic estimated of breeding values.

Method of Selection ; Where we come from..??

ΔG=Accuracy of selection X intensity of selection X genetic standard deviation

Generation interval

Selections can take place in laboratory

Selection was played important role in Human-plant co evolution

Genetic gain /GA;

Genotype

4

Time

Find population

Find markers

Phenotype

QTL/gene

selection

Genetic gain

Breeder

GS

Genomic informatio

n

Method of Selection ; Where we come from..??

Traits with low heritability

Traits that are expressed late in individual’s life

Traits that can not be measured easily (ex: disease resistance &

quality traits)

Time consuming and the rate of breeding is slow

Limitations

Traditional Selection

PS

“Picking the low hanging fruit”

The genes with big QTL effects

The major success is only achieved with the qualitative traits

Limitations of MAS

The biparental mapping populations used in most QTL studies do not readily translate to breeding applications

The term ‘GS’ was first introduced by Haley and Visscher at the 6th World Congress on Genetics Applied to Livestock Production at Armidale, Australia in 1998.

GS was first propounded by Meuwissen et al (2001) : Seminal paper ‘Meuwissen et al (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157: 1819-29.”

Dr. Theo Meuwissen

Whole Genome Selection

Genomic Selection; an emerging breeding methodology designed to exploit high-throughput, inexpensive DNA marker information to accurately predict the genetic value of breeding candidates for complex traits.

EBV; An estimate of the additive genetic merit for a particular trait that an individual will pass on to its descendant's.”

GEBVs; Prediction of the genetic merit of an individual based on its genome.

Genome Selection

Trace all segments of the genome with markers-Capture all QTL = all genetic variance

Predict genomic breeding values as sum of effects over all segments

Genomic selection exploits LD.

Genomic selection avoids bias in estimation of effects due to multiple testing, as all effects fitted simultaneously.

How to estimate Breeding value?

X

10 litre

12 litre8 litre 10 litre0.5 litre

What is the Breeding value of this cow for milk production?

Breeding value =h2(milk production-average)

= (12-7.625)*h2 = 4.35 litres

Traditional selection

GS MAS

MAS concentrates on a small number of QTLs that are tagged by markers with well verified associations.

GS : genome-wide panel of dense markers so that all QTls are in LD with at least one marker

120 cms

15cms

Major and minor QTLs

Nakaya et al 2012

Why Genomic selection important to turn on now..??

Relatively slow progress via phenotypic selection

Large cost of phenotyping Limited throughput (plot area, time, people) QTs + small effects Decreasing cost of genotyping Promising results from simulation and cross

validation of GS.

Meet the challenge of feeding 9.5 billion @ 2050.

Pre-requisite for the introduction of GS

The need for adequate and affordable genotyping platforms.

Relatively simple breeding schemes in which selection of additive genetic effects will generate useful results.

Statistical methods.

1998

20002001

2002

2004

2005

Year

Genomic Information

Genomic information

GW-MI

Cont..

How can we do that..?

Inexpensive, high-density genotypes

Accurate phenotypes

Heffner et al (2009)

Prerequisite Training Population (genotypes + phenotypes)Selection Candidates (genotypes)

Crops are Concerned

Biparental vs. Multi-Family

Biparental1. Population specific2. Reduced epistasis3. Reduced number of markers required4. Smaller training populations required5. Balanced allele frequencies6. Best for introgression of exotic

Multi-Family1. Allows prediction across a broader

range of adapted germplasm2. Allows sampling of more E3. Cycle duration is reduced because

retraining model is on-going.4. Allows larger training populations5. Greater genetic diversity

Training Population

Genomic Selection

Cardinal points for success of GS

1. Population type & size of training population

2. Genotyping Platforms & marker densities.

3. Availability of HD genome wide markers.

4. Appropriate statistical methods for accurate GEBVs.

5. Epistasis & G x E.

6. Linkage disequilibrium

7. Long term selection

SNP

DArT

GBS

Marker types & Marker density

SNP chip in Genomic selection

Single markers (gene) predict in very small differences.

Abundant in nature. 1kb-2SNP.

Predicting differences in BVs.

What sequences we can call as haplotypes?

The similar haplotypes will make haplotype block where there will be high LD and less recombination's.

Is GBS a suitable marker platform for genomic selection?

Obviously ..!!!

GBSElshire et al (2010)

GBS accesses regulatory regions and sequence tag mapping.

Flexibility and low cost.

GBS markers led to higher genomic prediction accuracies.

Impute missing data.

Highly multiplexed

Even for a species with a genome as challenging as wheat (Absence of a reference genome)

Poland et al (2011)

i. RF

ii. MVN EM

Statistical model used

GBS markers are more uniformly distributed across the genome than the DArT markers

Number and size of QTLs.

LD between marker and QTLs.

Marker density, marker type, and training population size.

Number of lines increases (accuracy GEBVs ↑)

GS Prediction Accuracies

Heritability of the trait.

Genetic structure of the trait.

Simulation study results.

Cross-validation; How close is the simulated data to real data?

GS Prediction Accuracies

i. Stepwise regression (SR)

ii. Ridge regression BLUP (RR BLUP)

iii. Bayesian regression (BR)

Genomic selection prediction models

Meuwissen et al (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157: 1819-29.

Select most significant markers on the basis of arbitrary significant thresholds and non significant markers effect equals to zero. (Lande and Thompson, 1990)

Estimate the effect of significant markers using multiple regression Since, only a portion of the genetic variance will be captured.

Limitations :

Detects only large effects and that cause overestimation of significant effects. (Goddard and Hayes, 2007; Beavis, 1998 )

SR resulted in low GEBVs accuracy due to limited detection of QTLs. (Meuwissen et al 2001)

Stepwise Regression (SR)

Ridge Regression BLUP (RR-BLUP)

Simultaneously select all marker effects rather than categorizing into significant or having no effect

Ridge regression shrinks all marker effects towards zero.

The method makes the assumption that markers are random effects with a equal variance. (Meuwissen et al 2001)

Limitations :

RR-BLUP incorrectly treats all effects equally which is unrealistic.

(Xu et al 2003)RR-BLUP Superior to SR

Bayesian Regression (BR)

Marker variance treated more realistically by assuming specified prior distribution.

BayesA: uses an inverted chi-square to regress the marker variance towards zero.

All marker effects are > 0 (Bayes A)

BayesB: assume a prior mass at zero, thereby allowing for markers with no effects.

Some marker effects can be = 0 (Bayes B) (Meuwissen et al

2001)

Other potential Genomic selection prediction models

i. Least absolute shrinkage and selection operator (LASSO)

ii. Reproducing Kernel Hilbert spaces and support vector machine regression. (RKHS) Gianola et al (2006)

iii. Partial Least Squares regression & principle component regression.

iv. RF (R package random forest)

v. MVN EM Algorithm

R-Package for GS http://www.r-project.org

A genome of 1000 cM was simulated with a marker spacing of 1 cM

Modeling epistasis and dominance

Accurate prediction of dominance and epistatic effects fetch advantageous.

Lorenza et al pointed out inclusion of epistatic effects in prediction models will give improve accuracy with condition as;

Epistasis is present & can be modelled accurately.

Blanc et al (2006) reported that epistasis will contribute to marker effects.

Empirical studies harnessing data are illuminating for this topic.

GS in relation to strong subpopulation structure

GWAS studies, SPS potentially cause spurious long distance / unlinked association b/w marker allele & phenotype.

GS, shifts to being able to maintain predictive ability despite a structure training data set & spurious association will not be an important cause for loss of predictive ability.

LD is not consistent, allelic effects estimated in one subpopulation will not be predictive for another subpopulation.

Long-term selection

Improving gain in the long-term necessarily requires a trade-off with short-term gain.

Long-term gain is often explicit, as in quantitative genetic models that maximize immediate predicted gain subject to a constraint on the rate of inbreeding. Meuwissen (1997).

Two approaches:1. Select individuals or groups2. Analytical prediction, deterministic simulation using

Numerical approaches to optimization, and stochastic simulation

Some issues :

Has proved its value in animal breeding particularly dairy cattle(Hayes and Goddard, 2010)

Still to prove its value over generations in crop plants

Simulation studies in plants suggest potential for improved gain per unit time. (Jannink et al 2010)

Genomic selection

Future Directions..???

GS has been seldom implemented in the field

Where to apply GS in the breeding cycle (which generations)

How many lines to select for genotyping.

Where and how do we place our training population in comparison to the selection candidates?

How many markers are required, determined by the extent of LD.

How can we implement non additive effects into our models to allow predictions across multiple generations?

How do non-additive effects affect the accuracy of genomic selection.

How often to re-estimate the chromosome segment effects?

Future Directions..???

Outstanding questions that remain unanswered..??

How much gain do we expect when using GS? how much potential loss ??

can a breeding program absorb?

GS future perspectives

Training population design.

Epistatic modelling in GS.

Strength of different statistical methods.

Managing short & long term gain.

Does genomic selection have a future in plant breeding..?

Lorenz LabDepartment of Agronomy & Horticulture

University of Nebraska-Lincolnhttp://www.lorenzlab.net

Rex BernardoDepartment of Agronomy and Plant Genetics

University of Minnesota

Further Interest..??

Visit….

http://www.lorenzlab.net/

Ongoing projects on GS

Crop Trait Markers FUNDING AGENCY

PROJECT DURATION

Tomato Quality, shape, shelf life

SNPUSDA/AFRI

2009-2013

Barley FHB resistance SNP Univ. of Minnesota 2013

Trifolium Yield SNP Danish plant research and for

Aarhus University

2010-2015

Wheat Winter wheat genotype-by-sequencing

Wheat Breeding Presidential Chair

2014

Maize Drought SNP CIMMYT 2014

Maize Total biomass yield and silage

quality

SNP USDA-AFRI 2014

Sugar beet White sugar yield, sugar

content

SNP State Plant Breeding Institute, University

of Hohenheim

2013

Conclusion

“Nothing In Science Has Any Value To Society If It Is Not Communicated”-Anne

Roe

Education

Whole Genome Selection