Transcript
Page 1: Targeted Sequencing of Human  Genomes, Transcriptomes, and Methylomes

Targeted Sequencing of Human Genomes, Transcriptomes, and

Methylomes

Jin Billy LiGeorge Church Lab

Harvard Medical [email protected]

Page 2: Targeted Sequencing of Human  Genomes, Transcriptomes, and Methylomes

Genetic Loci X Sample Size = Information

# sa

mpl

es

# genetic loci

PCR seqMass-spec

Shotgun seqRNA-seqChIP-seq

SNP array

Page 3: Targeted Sequencing of Human  Genomes, Transcriptomes, and Methylomes

Target Capturing with Padlock Probes (aka MIPs)

feature 1 feature n

pol

lig …

PCR (or RCA)

Porreca et al., Nat Methods 2007

Page 4: Targeted Sequencing of Human  Genomes, Transcriptomes, and Methylomes

Mass Production of Padlock Oligos

100 nt

150 nt

50 nt

55k features of up to 200nt

Page 5: Targeted Sequencing of Human  Genomes, Transcriptomes, and Methylomes

0.0

0.2

0.4

0.6

0.8

1.0

1.2

15

min

s

1 h

ou

r

1 d

ay

1 d

ay

+cy

clin

g

2 d

ays

5 d

ays

10

:1

50

:1

10

0:1

25

0:1 1x

10

x

10

0x

1,0

00

x

10

,00

0x

variable hyb time variable probe:gDNA variable dNTP amount

probe:gDNA = 10:1 2 day hyb time 1 day hyb time

100x dNTP 100x dNTP probe:gDNA = 100:1

Ca

ptu

rin

g e

ffic

ien

cy

(%

)

0

100

200

300

400

500

Fo

ld i

mp

rov

em

en

t

~10,000-fold Improvement Since Nov 2007

1. longer hybridization time; 2. more probes; 3. right [dNTP]

1 2 3

*

* 20-fold improvement already by better probe design and synthesis Li et al., in prepration

Page 6: Targeted Sequencing of Human  Genomes, Transcriptomes, and Methylomes

0.0

0.2

0.4

0.6

0.8

1.0

1.2

15

min

s

1 h

ou

r

1 d

ay

1 d

ay

+cy

clin

g

2 d

ays

5 d

ays

10

:1

50

:1

10

0:1

25

0:1 1x

10

x

10

0x

1,0

00

x

10

,00

0x

variable hyb time variable probe:gDNA variable dNTP amount

probe:gDNA = 10:1 2 day hyb time 1 day hyb time

100x dNTP 100x dNTP probe:gDNA = 100:1

Ca

ptu

rin

g e

ffic

ien

cy

(%

)

0

100

200

300

400

500

Fo

ld i

mp

rov

em

en

t

~10,000-fold Improvement Since Nov 2007

1. longer hybridization time; 2. more probes; 3. right [dNTP]

1 2 3

*

* 20-fold improvement already by better probe design and synthesis Li et al., in prepration

Page 7: Targeted Sequencing of Human  Genomes, Transcriptomes, and Methylomes

0.0

0.2

0.4

0.6

0.8

1.0

1.2

15

min

s

1 h

ou

r

1 d

ay

1 d

ay

+cy

clin

g

2 d

ays

5 d

ays

10

:1

50

:1

10

0:1

25

0:1 1x

10

x

10

0x

1,0

00

x

10

,00

0x

variable hyb time variable probe:gDNA variable dNTP amount

probe:gDNA = 10:1 2 day hyb time 1 day hyb time

100x dNTP 100x dNTP probe:gDNA = 100:1

Ca

ptu

rin

g e

ffic

ien

cy

(%

)

0

100

200

300

400

500

Fo

ld i

mp

rov

em

en

t

~10,000-fold Improvement Since Nov 2007

1. longer hybridization time; 2. more probes; 3. right [dNTP]

1 2 3

*

* 20-fold improvement already by better probe design and synthesis Li et al., in prepration

Page 8: Targeted Sequencing of Human  Genomes, Transcriptomes, and Methylomes

Improved Technology -> Better Performance

95% captured85% within 100-fold range55% within 10-fold range

Sensitivity + Uniformity Correlation

Nov 2007 Nov 2007

Current

Current

Li et al., in prepration

Page 9: Targeted Sequencing of Human  Genomes, Transcriptomes, and Methylomes

Summary of Improvements

Nov 2007 Current

Specificity ~100% ~100%

Sensitivity/Multiplexity (of 55k)

18% 95%

Uniformity (in 100-fold range)

16% 85%

Correlation of replicates (r)

0.35 0.98

Accuracy (heterozygous calls)

31% 99%

Page 10: Targeted Sequencing of Human  Genomes, Transcriptomes, and Methylomes

Targeted Capturing of

• Genomes– Exome: PGP etc.– Contiguous regions or gene panels– SNPs– Hypermutable CpG dinucleotides

• Transcriptomes– Alleotyping– RNA editing sites

• Methylomes– CpG methylation

Page 11: Targeted Sequencing of Human  Genomes, Transcriptomes, and Methylomes

Targeted Capturing of

• Genomes– Exome: PGP etc.– Contiguous regions or gene panels– SNPs– Hypermutable CpG dinucleotides

• Transcriptomes– Alleotyping– RNA editing sites

• Methylomes– CpG methylation

Page 12: Targeted Sequencing of Human  Genomes, Transcriptomes, and Methylomes

Predicting Putative Editing SitesA in the genome

G in some mRNAs or ESTs

A -> I (G) RNA Editing

• Post-transcriptional A -> I • I is read as G during translation• Only 10 targets are known in human coding

regions

Page 13: Targeted Sequencing of Human  Genomes, Transcriptomes, and Methylomes

36,000 predicted editing sitesgDNA + 7 tissue cDNAs from an individual

Padlock + Solexa: 239 sites found to be edited

Validation (PCR + Sanger):18 of 20 random sites are obviously

edited

Discovery of 100’s of Novel Editing Sites

with Erez Levanon, in preparation

Page 14: Targeted Sequencing of Human  Genomes, Transcriptomes, and Methylomes

Genomic DNA

RNA - intestine

RNA - kidney

RNA - diencephalon

RNA - frontal lobe

RNA - corpus callosum

RNA - cerebellum

RNA - adrenal

Example:

VEZF1

Page 15: Targeted Sequencing of Human  Genomes, Transcriptomes, and Methylomes

Bisulfite Padlock Probes (BSP): CpG Methylation

Bisulfite-treated genome

“3-base”genome

Highspecificityof padlock

Page 16: Targeted Sequencing of Human  Genomes, Transcriptomes, and Methylomes

Methylation Level Accurately Measured

r = 0.979-0.2

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Methylation level measured by BSP sequencingM

eth

yla

tio

n le

ve

l es

tim

ate

d b

y S

an

ge

r s

eq

ue

nc

ing

BSP-BSP correlation BSP-Sanger correlation

Methylation level measured by BSP sequencing

Met

hyla

tion

leve

l est

imat

ed b

y S

ange

r se

quen

cing

Methylation level, replicate 1

Met

hyla

tion

leve

l, re

plic

ate

2

r = 0.966

Page 17: Targeted Sequencing of Human  Genomes, Transcriptomes, and Methylomes

Methylation Pattern around GenesGene-Body Methylation

with Madeleine Price Ball, in preparation (poster)

Page 18: Targeted Sequencing of Human  Genomes, Transcriptomes, and Methylomes

George Church

Padlock technologyKun ZhangJohn AachAbraham RosenbaumJay ShendureGreg PorrecaAnnika Ahlford

RNA editingErez Levanon Jung-Ki Yoon

CpG methylationMadeleine Price Ball

Church Lab

Acknowledgements

AgilentEmily LeproustWilson Woo

SequencingYuan GaoBin XieBob Steen

Page 19: Targeted Sequencing of Human  Genomes, Transcriptomes, and Methylomes

Superior Quality of Padlock Oligos

100 nt

150 nt

50 nt

0

2

4

6

8

10

12

0 10 20 30 40 50

Number of reads

Pe

rce

nta

ge

of

sit

es

(%

)

before amplification (data)after amplication (data)before amplication (poisson)after amplification (poisson)

PCR (2x)

Solexa sequencing

55k features of up to 200nt

Fra

ctio

n o

f pr

obes

Page 20: Targeted Sequencing of Human  Genomes, Transcriptomes, and Methylomes

U

From Agilent Oligos to Padlock Probes

amplification and selection

T 18bp Agilent oligo, 136 bp 18bp

PCR

* p

exonuclease

USER + DpnII

DpnII

NN

UA

U

Annealed with DpnII guide oligo

Padlock probe

*

*

Page 21: Targeted Sequencing of Human  Genomes, Transcriptomes, and Methylomes

Heterozygous Genotypes Correctly Called

Homozygous wild typeHeterozygous variationHomozygous variation

before after

Page 22: Targeted Sequencing of Human  Genomes, Transcriptomes, and Methylomes

Methods in Comparison

Padlock Array-based hyb

Upfront probe cost

(10-20% of exome)$12,000 per 55k 100mers $600 per 385k 70mers

Probes amplifiable? Yes No

Reaction phase Solution, 10-20 μl Surface, 200 μl

Enzymatic hyb? Yes No

gDNA required ~0.5-1 μg 20 μg (WGA)

Efficiency (->accuracy) 1% N/A (<0.1%?)

Uniformity 100-fold range 10-fold range

Specificity ~100% on target 30-80% on or near target

Page 23: Targeted Sequencing of Human  Genomes, Transcriptomes, and Methylomes

125

160 159 162155146 139 142

181166

293

166165153

38

156

0

50

100

150

200

250

300

proximal distal proximal distal

extension arm ligation arm

Av

era

ge

co

ve

rag

e

A

C

G

T

Differential Clamping at Ligation Junction

Page 24: Targeted Sequencing of Human  Genomes, Transcriptomes, and Methylomes

% GC VS Capturing Efficiency

0

50

100

150

200

(10,

15]

(15,

20]

(20,

25]

(25,

30]

(30,

35]

(35,

40]

(40,

45]

(45,

50]

(50,

55]

(55,

60]

(60,

65]

(65,

70]

(70,

75]

(75,

80]

(80,

85]

(85,

90]

% GC

Ave

rag

e co

vera

ge

gap + arms

gap

extension arm

ligation arm

Page 25: Targeted Sequencing of Human  Genomes, Transcriptomes, and Methylomes

99% Concordance Between Padlock and HapMap

Page 26: Targeted Sequencing of Human  Genomes, Transcriptomes, and Methylomes

The Editing “Calls” Are Well Correlated

0.01

0.1

1

0.01 0.1 1

G/(A+G), frontal lobe replicate 1

G/(

A+

G),

fro

nta

l lo

be

re

plic

ate

2r = 0.964

Page 27: Targeted Sequencing of Human  Genomes, Transcriptomes, and Methylomes

Bisulfite-treated genome

• 10k CpG sites tiling the ENCODE regions – 1 CpG site every 3kb region on average

• High specificity– 79 of 80 Sanger reads match correct locations

Bisulfite Padlock Probes (BSP): CpG Methylation

Page 28: Targeted Sequencing of Human  Genomes, Transcriptomes, and Methylomes

B

strep

B

P

P

B

B

collected in a tube

PCR

λ exonuclease

shearing, end polishing

adapter ligation

hybridization in closed-tube solution

denaturing, PCR

Li et al., unpublished

Page 29: Targeted Sequencing of Human  Genomes, Transcriptomes, and Methylomes

Methods in Comparison

Padlock Array-based hyb Biotin-coupled hyb

Upfront probe cost

(10-20% of exome)$12,000 per 55k 100mers

$600 per 385k 70mers

$500 per 244k 60mers

Probes amplifiable? Yes No Yes

Reaction phase Solution, 10-20 μl Surface, 200 μl Solution, 10-20 μl

Enzymes in hyb? Yes No No

gDNA required ~0.5-1 μg 20 μg (WGA) ~0.5-1 μg

Efficiency (->accuracy) 1% N/A (<0.1%?) ~10%?

Uniformity 100-fold range 10-fold range 10-fold range?

Specificity ~100% on target30-80% on or near target

~55% on or near target

Page 30: Targeted Sequencing of Human  Genomes, Transcriptomes, and Methylomes

Two Tech Replicates Are Well Correlated

Ranked target sites

Num

ber

of r

eads

per

site

Counts, replicate 1C

ount

s, r

eplic

ate

2

Uniformity Correlation of counts


Recommended