36
Lessons learnt from the 1000 Genomes Project about sequencing in populations Gil McVean Wellcome Trust Centre for Human Genetics and Department of Statistics, University of Oxford

Lessons learnt from the 1000 Genomes Project about sequencing in populations

Embed Size (px)

DESCRIPTION

Lessons learnt from the 1000 Genomes Project about sequencing in populations. Gil McVean Wellcome Trust Centre for Human Genetics and Department of Statistics, University of Oxford. Some questions. What has the 1000 Genomes Project told us about how to sequence (in) populations - PowerPoint PPT Presentation

Citation preview

Page 1: Lessons learnt from the 1000 Genomes Project about sequencing in populations

Lessons learnt from the 1000 Genomes Project about sequencing in populations

Gil McVeanWellcome Trust Centre for Human Genetics and

Department of Statistics, University of Oxford

Page 2: Lessons learnt from the 1000 Genomes Project about sequencing in populations

Some questions

• What has the 1000 Genomes Project told us about how to sequence (in) populations

• What has the 1000 Genomes Project told us about populations

Page 3: Lessons learnt from the 1000 Genomes Project about sequencing in populations

Samples for the 1000 Genomes Project

Major population groups comprised of subpopulations of c. 100 each

GBRFIN

TSIIBS

CEU

JPTCHB

CHS

CDX

KHVGWB

GHN

YRI

MAB

LWK

MXL

CLM

ASW AJM

ACB

PEL

PUR

Samples from S. Asia

Page 4: Lessons learnt from the 1000 Genomes Project about sequencing in populations

The role of the 1000G Project in medical genetics

• A catalogue of variants– 95% of variants at 1% frequency in populations of interest

• A representation of ‘normal’ variation

• A set of haplotypes for imputation into GWAS

• A training ground for sequencing/statistical/computational technologies

Page 5: Lessons learnt from the 1000 Genomes Project about sequencing in populations

TSI*

CEU

JPTCHB

CHS*YRI

LWK*

*Exon pilot only

Samples for the 1000 Genomes Project: Pilot

Page 6: Lessons learnt from the 1000 Genomes Project about sequencing in populations

Population-scale genome sequencing

Haplotypes2x

10x

Page 7: Lessons learnt from the 1000 Genomes Project about sequencing in populations
Page 8: Lessons learnt from the 1000 Genomes Project about sequencing in populations

What has the project generated?

Page 9: Lessons learnt from the 1000 Genomes Project about sequencing in populations

>15 million SNPs, >50% of them novel

dbSNP entries increased by 70%

Page 10: Lessons learnt from the 1000 Genomes Project about sequencing in populations

An huge increase in the set of structural variants

Page 11: Lessons learnt from the 1000 Genomes Project about sequencing in populations

A robust and modular pipeline for analysis of population-scale sequence data

Page 12: Lessons learnt from the 1000 Genomes Project about sequencing in populations

An efficient format for storing aligned reads and a set of tools to manipulate and view the files

• SAM/BAM format for storing (aligned) reads

Bioinformatics (2009) http://samtools.sourceforge.net

Page 13: Lessons learnt from the 1000 Genomes Project about sequencing in populations

An information-rich format for storing generic haplotype/genotype data and tools for manipulating the files

http://vcftools.sourceforge.net

Page 14: Lessons learnt from the 1000 Genomes Project about sequencing in populations

An understanding of the ‘rare functional variant load’ carried by individuals

c. 250 LOF / personc. 75 HGMD DM

Page 15: Lessons learnt from the 1000 Genomes Project about sequencing in populations

USH2A

• Mutations cause with Usher syndrome

• 66 missense variants in dbSNP• 2/3 detected in 1000 Genomes Pilot• One HGMD ‘disease-causing’ variant homozygous in 3 YRI

– Other reports indicate this is not a real disease-causing variant

Page 16: Lessons learnt from the 1000 Genomes Project about sequencing in populations

Samples for the 1000 Genomes Project: Phase1

GBRFIN

TSI

CEU

JPTCHB

CHSYRI

LWK

MXL

CLM

ASW

PUR

Page 17: Lessons learnt from the 1000 Genomes Project about sequencing in populations

Lessons learnt about sequencing in populations

Page 18: Lessons learnt from the 1000 Genomes Project about sequencing in populations

Lesson 1.

The low-coverage model works for variant discovery

Page 19: Lessons learnt from the 1000 Genomes Project about sequencing in populations

A near complete record of common variants

CEU

Page 20: Lessons learnt from the 1000 Genomes Project about sequencing in populations

Lesson 2.

The low coverage model works for SNP genotyping

Page 21: Lessons learnt from the 1000 Genomes Project about sequencing in populations

A set of accurate genotypes/haplotypes

CEU

Page 22: Lessons learnt from the 1000 Genomes Project about sequencing in populations
Page 23: Lessons learnt from the 1000 Genomes Project about sequencing in populations

Lesson 3.

The genome has a large grey area where variant calling is hard

Page 24: Lessons learnt from the 1000 Genomes Project about sequencing in populations
Page 25: Lessons learnt from the 1000 Genomes Project about sequencing in populations

Lesson 4.

Joint calling of different variant types substantially improves the

quality of calls

Page 26: Lessons learnt from the 1000 Genomes Project about sequencing in populations
Page 27: Lessons learnt from the 1000 Genomes Project about sequencing in populations

Lesson 5.

Managing uncertainty is important

Page 28: Lessons learnt from the 1000 Genomes Project about sequencing in populations
Page 29: Lessons learnt from the 1000 Genomes Project about sequencing in populations

Lesson 6.

Data visualisation is key

Page 30: Lessons learnt from the 1000 Genomes Project about sequencing in populations
Page 31: Lessons learnt from the 1000 Genomes Project about sequencing in populations

Lessons learnt about populations

Page 32: Lessons learnt from the 1000 Genomes Project about sequencing in populations
Page 33: Lessons learnt from the 1000 Genomes Project about sequencing in populations

Closely related populations can have substantially different rare

variants

Page 34: Lessons learnt from the 1000 Genomes Project about sequencing in populations
Page 35: Lessons learnt from the 1000 Genomes Project about sequencing in populations

Spatial heterogeneity in non-genetic risk can differentially confound association studies for rare and common variants

Iain Mathieson

Page 36: Lessons learnt from the 1000 Genomes Project about sequencing in populations

Thanks to the many...

• Steering committee– Co-chairs: Richard Durbin and David Altshuler

• Samples and ELSI Committee– Co-chairs: Aravinda Chakravarti and Leena Peltonen

• Data Production Group– Co-chairs: Elaine Mardis and Stacey Gabriel

• Analysis Group– Co-Chairs: Gil McVean and Goncalo Abecasis– Subgroups in gene-targeted sequencing (Richard Gibbs) and population genetics (Molly Przeworski)

• Structural Variation Group– Co-chairs: Matt Hurles, Charles Lee and Evan Eichler

• DCC– Co-Chairs: Paul Flicek and Steve Sherry