44
Sequencing 128 Ashkenazi Genomes: Implications for Medical Genetics and History Shai Carmi Department of Computer Science Columbia University Itsik Pe’er’s lab UCLA October 2014

Sequencing 128 Ashkenazi Genomes: Implications for Medical Genetics and History Shai Carmi Department of Computer Science Columbia University Itsik Pe’er’s

Embed Size (px)

Citation preview

Sequencing 128 Ashkenazi Genomes: Implications for Medical

Genetics and History

Shai CarmiDepartment of Computer

Science

Columbia University

Itsik Pe’er’s labUCLA

October 2014

Outline

• Ashkenazi Jewish Genetics: Background

• The Ashkenazi Genome Sequencing Project

• Segment Sharing and Population History

• Opportunities and Future Directions

Outline

• Ashkenazi Jewish Genetics: Background

• The Ashkenazi Genome Sequencing Project

• Segment Sharing and Population History

• Opportunities and Future Directions

Ashkenazi Jewish (AJ) Genetics: Significance

Medical genetics• Large founder population• Mendelian disorders• Complex diseases

o Breast cancer, Parkinson’s, Crohn’s

Population genetics• Debated origins• Genetics of a founder event

mtDNA: Behar et al., 2004; Behar et al., 2006Y chr: Behar et al., 2003; Behar et al., 2004Disease genes: Risch et al., 2003; Slatkin, 2004SNP arrays: Gusev et al., 2012; Palamara et al., 2012Review: Ostrer and Skorecki, 2013

Founder Populations: Opportunities

Recent successes• Greece

o Tachmazidou et al., 2013; HDL

• Finlando Kurki et al. 2014; aneurysm

• Icelando Many papers; most recently

Steinthorsdottir et al., 2014; T2D

• Ashkenazi Jewso Hui et al., in preparation;

Crohn’s

See also: • Hatzikotoulas et al.,

2014• Zuk et al., 2014

TimeFounder populationNon-founder population

Disease alleles

Bottleneck

Population sizePresent

Problem: Common genotyping platforms do not include alleles rare outside the founder population

Opportunities: Reduced Haplotype Diversity

Chromosomes in the sample

Full sequence

Partial sequence (SNP array, low-coverage sequence)

Observed data

Imputation

Inferred sequence

Nearly-complete inferred sequence

Problem: The Ashkenazi population is missing a reference panel of complete sequences

Opportunities: Personal Genomics in AJ

Personal clinical genomics is hereBut genomes are hard to interpret

Problem: The Ashkenazi population is missing a reference panel of complete sequences

The Documented Ashkenazi History

• Ca. 1000: Small communities in Northern France, Rhineland

• Migration east

• Expansion

• Migration to US and Israel

• Origin?

• Founder event?

• European gene flow:o Where?o When?o How much?

• Relation to other Jews?

Whole-genomes?

Outline

• Ashkenazi Jewish Genetics: Background

• The Ashkenazi Genome Sequencing Project

• Segment Sharing and Population History

• Opportunities and Future Directions

The Ashkenazi Genome Consortium

NY area labs interested in specific diseases

Quantify utility in medical genetics

Learn about population

history

Phase I: 128 whole genomes (Completed*)Phase II: ≈500 whole genomes (NYGC; under way)

Large cohorts of AJ cases

Impute

* Carmi et al., Nat Commun, 2014

Technical Details

Property Genome (exome)

Coverage ≈56x

Fraction called 96.7±0.3% (98.1%)

Concordance with arrays

99.67±0.25%

Ti/Tv ratio 2.14±0.004 (3.05)

• Ashkenazi ancestry verified

• Some phenotypes exist• Sequencing by

Complete Genomics in three batches

o Uniform QC measures

• Error rate estimateso Using runs-of-homozygosity and a duplicateo SNVs: ≈10-40k errors per genome (FDR: 0.3-1.3%)o Indels: ≈10-30k errors per genome (FDR: 2-6%)

• QC: Remove indels, poly-allelic variants, Hardy-Weinberg violations, low call rate

• Errors after QC: ≈5k per genome

hets

roh

Comparison to Europeans

Comparison panels:• 26 Flemish from Belgium (platform-

matched)• 87 North-West Europeans [CEU (1000

Genomes)]Fraction novel (%)(dbSNP135)

Population-specific variants(25x25 genomes)

An Ashkenazi reference panel filters more benign variants than a European panel.

AJ Clinical Genomics

AJ Medical Genetics: Imputation

An Ashkenazi reference panel improves imputation accuracy of AJ SNP arrays compared to the standard European panel.

Correlation

between imputed and real

data

Rare variants (≤1%) accuracy:

87% vs 65%

Using Impute2

AJ Medical Genetics: Applications

• Our consortium:o An expanded carrier screening panel o Pharmacogenetically-important alleleso Low-frequency deletions in tumorso Association studies: schizophrenia,

Parkinson’s, Crohn’s, longevity, cancer

• Others:o Frequency lookups (clinical/pedigrees)o Association studies: Epilepsy, Autism, …

Principal Component Analysis (PCA)

Price et al., 2008; Olshen et al., 2008; Need et al., 2009; Kopelman et al., 2009; Atzmon et al., 2010; Behar et al., 2010; Bray et al., 2010; Guha et al., 2012; Behar et al., 2014

Ashkenazi Jews

Middle-East

Europe

Druze

Palestinians

Bedouins Sardinians

Tuscans

Italians

Basque

French

Flemish

Sephardi Jews(Italy, Turkey)

The Documented Ashkenazi History

• Origin?

• Founder event?

• European gene flow:o Where?o When?o How much?

• Relation to other Jews?

Variant Discovery Rate

Heterozygosity paradox?Number of variantsPredicted number of new variants

A Model for Ancient History

Out-of-Africa

Middle-East

European gene flow into AJ

25x25 genomes

The Documented Ashkenazi History

• Origin?

• Founder event?

• European gene flow:o Where?o When?o How much?

• Relation to other Jews?

Outline

• Ashkenazi Jewish Genetics: Background

• The Ashkenazi Genome Sequencing Project

• Segment Sharing and Population History

• Opportunities and Future Directions

Identical-by-Descent (IBD) Shared Segment

Formal definition: A contiguous segment inherited from a single, recent common ancestor.

g

IBD segment

After Browning & Browning, 2012

What’s “recent”?

Identical-by-Descent (IBD) Shared Segment

Practical definition: A contiguous segment nearly identical over a sequence length longer than a cutoff.

• Requires strong genetic drift

• Segments are rare but long o Probability of a site to be shared o Segment length

• Current methods can detect segments 1cM

g

IBD segment

Formal definition: A contiguous segment inherited from a single, recent common ancestor.

Applications

• A segment indicates recent co-ancestry:o Disease mappingo Pedigree reconstructiono Detecting natural selectiono Demographic (historical)

inferenceo Estimating mutation rates

• Identical sequence across individuals:o Resolving haplotypes

(phasing)o Imputationo Estimating heritabilityo Estimating genotyping error

rate

g

IBD segment

Eskin’s lab

IBD Sharing Theory

• Model:o A population with a constant effective size No Two chromosomes of length L (Morgans)o A minimal segment length m (Morgans)

• The number of shared segments nm?

• The fraction of the chromosome in shared segments fm?

L

mℓ1 ℓ3ℓ2

;

Results overview

• Under the Sequentially Markov Coalescent (SMC):

• The number of shared segments:

;

• The fraction of the chromosome in shared segments:

;

• Results for a more realistic coalescent model (SMC’)

• Implicit expressions for the distributions

• All results generalizable to variable population size

Palamara et al., 2012; Carmi et al., Genetics, 2013; Carmi et al., Theor Popul Biol, 2014

Demographic Inference: Maximum Likelihood

Carmi et al., Theor Popul Biol, 2014

Use the distribution of the number of shared segments

Demographic Inference: A Practical Approach

Palamara et al., 2012

• Historical size N(t)=N0 ν(t).

• Mean fraction of the genome in segments of length ℓ1<ℓ<ℓ2:

(1)

Method:• Record IBD segments in

each length bin• Using Eq. (1), find the

history N(t) that fits best

Hypothetical example

IBD Sharing in Ashkenazi Jews

Gusev et al., 2012

A pair of AJ individuals shares ≈50cM in ≈15 long segments (>3cM)

Atzmon et al., 2010

Bray et al., 2010

AJ

EU

Inferring the Bottleneck Size and Time

Carmi et al., Nat. Commun., 2014Palamara et al., 2012

Inferring the Bottleneck Size and Time

Carmi et al., Nat. Commun., 2014Palamara et al., 2012

Inferring the Bottleneck Size and Time

Carmi et al., Nat. Commun., 2014Palamara et al., 2012

Time (years)

Caveats

• Phasing and sequencing errors; IBD detection errors

• Reasonable power only for 10-50 generations ago

• Model specification (e.g. prolonged bottleneck, admixture)

Parameter 95% confidence interval

Ancestral size 3654-5856

Bottleneck size 249-419

Growth rate (per generation)

16-53%

Bottleneck time (years)

625-800• A bottleneck 700ya confirmed by an independent method: lengths of haplotypes around rare variants

o Mathieson and McVean, 2014

The Documented Ashkenazi History

• Origin?

• Founder event?

• European gene flow:o Where?o When?o How much?

• Relation to other Jews?

Outline

• Ashkenazi Jewish Genetics: Background

• The Ashkenazi Genome Sequencing Project

• Segment Sharing and Population History

• Opportunities and Future Directions

Coverage by Shared Segments

A sequenced reference panel

Partly sequenced genome

Impute

What fraction of the genome can we cover with shared segments?

Full sequence

Partial sequence

Nearly-complete inferred sequence

The Era of Near-Complete Coverage

NowPhase II

Mine public data?Other studies?

Opportunities:• Interpret personal genomes

o Time-stamp rare mutations• Cost-effective large-scale association

studieso Resolve haplotypeso Impute SNP arrays or low-coverage

sequenceso Mapping rare variants/haplotypes

See Carmi et al., Genetics, 2013 for a theoretical analysis

The Era of Near-Complete Coverage

New algorithms

needed!

g

IBD segment

Time-stamp rare mutations

NowPhase II

Mine public data?Other studies?

Ashkenazi History

• Origin?

• Founder event?

• European gene flow:o Where?o When?o How much?

• Relation to other Jews?

The Place of European Gene Flow

“Most of these theories … are myths or speculation … based on some vague or misunderstood references. … It will probably be impossible to say definitely where the hundreds or thousands of Jews in Poland in the 13th to 14th centuries came from.”

B. Weinryb, The Jews of Poland, 1972

Approach

Johnson et al., 2011; Moreno-Estrada et al., 2013

oooooo

oooooo

EU ME

xx xxxx

xxxx

xxx xx

xxxxxx

xxxxxx

xx xxxx

xxxx

xxx xx

xxxxxx

xxxxxx

EU

ME

xx xxxx

xxxx

xxx xx

oooooo

xxxxxx

xxxxxx

EUME

AJ

An Ashkenazi genome

PC2

PC1

PC1 PC1

PC2 PC2

Preliminary Results

• Origin in the Levant

• Gene flow mostly fromWest-Europe, about 30 generations ago

• Sex-imbalanced history?

Summary

• It is important to study Ashkenazi genetics• We sequenced 128 whole-genomes• Useful for personal clinical genomics and

imputation• Segment sharing reveals a founder event

and suggests opportunitiesMy research statement

Acknowledgements

Funding:Human Frontier Science program

Itsik Pe’er’s lab:James Xue, Ethan Kochav, Shuo Yang, Pier Palamara, Vladimir Vacic

TAGC consortium members:Todd Lencz, Semanti Mukherjee (LIJMC)Lorraine Clark, Xinmin Liu (CUMC)Gil Atzmon, Harry Ostrer, Danny Ben-Avraham (AECOM)Inga Peter, Judy Cho (ISMMS) Ariel Darvasi (HUJI)Joseph Vijai (MSKCC)Ken Hui (Yale)VIB Ghent, Belgium

Thank you for your attention!

Harvard University:Peter Wilton, John Wakeley

Sheba Medical Center:Eitan Friedman