30
Targeted Sequencing of Human Genomes, Transcriptomes, and Methylomes Jin Billy Li George Church Lab Harvard Medical School [email protected]

Targeted Sequencing of Human Genomes, Transcriptomes, and Methylomes Jin Billy Li George Church Lab Harvard Medical School [email protected]

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Targeted Sequencing of Human Genomes, Transcriptomes, and

Methylomes

Jin Billy LiGeorge Church Lab

Harvard Medical [email protected]

Genetic Loci X Sample Size = Information

# sa

mpl

es

# genetic loci

PCR seqMass-spec

Shotgun seqRNA-seqChIP-seq

SNP array

Target Capturing with Padlock Probes (aka MIPs)

feature 1 feature n

pol

lig …

PCR (or RCA)

Porreca et al., Nat Methods 2007

Mass Production of Padlock Oligos

100 nt

150 nt

50 nt

55k features of up to 200nt

0.0

0.2

0.4

0.6

0.8

1.0

1.2

15

min

s

1 h

ou

r

1 d

ay

1 d

ay

+cy

clin

g

2 d

ays

5 d

ays

10

:1

50

:1

10

0:1

25

0:1 1x

10

x

10

0x

1,0

00

x

10

,00

0x

variable hyb time variable probe:gDNA variable dNTP amount

probe:gDNA = 10:1 2 day hyb time 1 day hyb time

100x dNTP 100x dNTP probe:gDNA = 100:1

Ca

ptu

rin

g e

ffic

ien

cy

(%

)

0

100

200

300

400

500

Fo

ld i

mp

rov

em

en

t

~10,000-fold Improvement Since Nov 2007

1. longer hybridization time; 2. more probes; 3. right [dNTP]

1 2 3

*

* 20-fold improvement already by better probe design and synthesis Li et al., in prepration

0.0

0.2

0.4

0.6

0.8

1.0

1.2

15

min

s

1 h

ou

r

1 d

ay

1 d

ay

+cy

clin

g

2 d

ays

5 d

ays

10

:1

50

:1

10

0:1

25

0:1 1x

10

x

10

0x

1,0

00

x

10

,00

0x

variable hyb time variable probe:gDNA variable dNTP amount

probe:gDNA = 10:1 2 day hyb time 1 day hyb time

100x dNTP 100x dNTP probe:gDNA = 100:1

Ca

ptu

rin

g e

ffic

ien

cy

(%

)

0

100

200

300

400

500

Fo

ld i

mp

rov

em

en

t

~10,000-fold Improvement Since Nov 2007

1. longer hybridization time; 2. more probes; 3. right [dNTP]

1 2 3

*

* 20-fold improvement already by better probe design and synthesis Li et al., in prepration

0.0

0.2

0.4

0.6

0.8

1.0

1.2

15

min

s

1 h

ou

r

1 d

ay

1 d

ay

+cy

clin

g

2 d

ays

5 d

ays

10

:1

50

:1

10

0:1

25

0:1 1x

10

x

10

0x

1,0

00

x

10

,00

0x

variable hyb time variable probe:gDNA variable dNTP amount

probe:gDNA = 10:1 2 day hyb time 1 day hyb time

100x dNTP 100x dNTP probe:gDNA = 100:1

Ca

ptu

rin

g e

ffic

ien

cy

(%

)

0

100

200

300

400

500

Fo

ld i

mp

rov

em

en

t

~10,000-fold Improvement Since Nov 2007

1. longer hybridization time; 2. more probes; 3. right [dNTP]

1 2 3

*

* 20-fold improvement already by better probe design and synthesis Li et al., in prepration

Improved Technology -> Better Performance

95% captured85% within 100-fold range55% within 10-fold range

Sensitivity + Uniformity Correlation

Nov 2007 Nov 2007

Current

Current

Li et al., in prepration

Summary of Improvements

Nov 2007 Current

Specificity ~100% ~100%

Sensitivity/Multiplexity (of 55k)

18% 95%

Uniformity (in 100-fold range)

16% 85%

Correlation of replicates (r)

0.35 0.98

Accuracy (heterozygous calls)

31% 99%

Targeted Capturing of

• Genomes– Exome: PGP etc.– Contiguous regions or gene panels– SNPs– Hypermutable CpG dinucleotides

• Transcriptomes– Alleotyping– RNA editing sites

• Methylomes– CpG methylation

Targeted Capturing of

• Genomes– Exome: PGP etc.– Contiguous regions or gene panels– SNPs– Hypermutable CpG dinucleotides

• Transcriptomes– Alleotyping– RNA editing sites

• Methylomes– CpG methylation

Predicting Putative Editing SitesA in the genome

G in some mRNAs or ESTs

A -> I (G) RNA Editing

• Post-transcriptional A -> I • I is read as G during translation• Only 10 targets are known in human coding

regions

36,000 predicted editing sitesgDNA + 7 tissue cDNAs from an individual

Padlock + Solexa: 239 sites found to be edited

Validation (PCR + Sanger):18 of 20 random sites are obviously

edited

Discovery of 100’s of Novel Editing Sites

with Erez Levanon, in preparation

Genomic DNA

RNA - intestine

RNA - kidney

RNA - diencephalon

RNA - frontal lobe

RNA - corpus callosum

RNA - cerebellum

RNA - adrenal

Example:

VEZF1

Bisulfite Padlock Probes (BSP): CpG Methylation

Bisulfite-treated genome

“3-base”genome

Highspecificityof padlock

Methylation Level Accurately Measured

r = 0.979-0.2

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Methylation level measured by BSP sequencingM

eth

yla

tio

n le

ve

l es

tim

ate

d b

y S

an

ge

r s

eq

ue

nc

ing

BSP-BSP correlation BSP-Sanger correlation

Methylation level measured by BSP sequencing

Met

hyla

tion

leve

l est

imat

ed b

y S

ange

r se

quen

cing

Methylation level, replicate 1

Met

hyla

tion

leve

l, re

plic

ate

2

r = 0.966

Methylation Pattern around GenesGene-Body Methylation

with Madeleine Price Ball, in preparation (poster)

George Church

Padlock technologyKun ZhangJohn AachAbraham RosenbaumJay ShendureGreg PorrecaAnnika Ahlford

RNA editingErez Levanon Jung-Ki Yoon

CpG methylationMadeleine Price Ball

Church Lab

Acknowledgements

AgilentEmily LeproustWilson Woo

SequencingYuan GaoBin XieBob Steen

Superior Quality of Padlock Oligos

100 nt

150 nt

50 nt

0

2

4

6

8

10

12

0 10 20 30 40 50

Number of reads

Pe

rce

nta

ge

of

sit

es

(%

)

before amplification (data)after amplication (data)before amplication (poisson)after amplification (poisson)

PCR (2x)

Solexa sequencing

55k features of up to 200nt

Fra

ctio

n o

f pr

obes

U

From Agilent Oligos to Padlock Probes

amplification and selection

T 18bp Agilent oligo, 136 bp 18bp

PCR

* p

exonuclease

USER + DpnII

DpnII

NN

UA

U

Annealed with DpnII guide oligo

Padlock probe

*

*

Heterozygous Genotypes Correctly Called

Homozygous wild typeHeterozygous variationHomozygous variation

before after

Methods in Comparison

Padlock Array-based hyb

Upfront probe cost

(10-20% of exome)$12,000 per 55k 100mers $600 per 385k 70mers

Probes amplifiable? Yes No

Reaction phase Solution, 10-20 μl Surface, 200 μl

Enzymatic hyb? Yes No

gDNA required ~0.5-1 μg 20 μg (WGA)

Efficiency (->accuracy) 1% N/A (<0.1%?)

Uniformity 100-fold range 10-fold range

Specificity ~100% on target 30-80% on or near target

125

160 159 162155146 139 142

181166

293

166165153

38

156

0

50

100

150

200

250

300

proximal distal proximal distal

extension arm ligation arm

Av

era

ge

co

ve

rag

e

A

C

G

T

Differential Clamping at Ligation Junction

% GC VS Capturing Efficiency

0

50

100

150

200

(10,

15]

(15,

20]

(20,

25]

(25,

30]

(30,

35]

(35,

40]

(40,

45]

(45,

50]

(50,

55]

(55,

60]

(60,

65]

(65,

70]

(70,

75]

(75,

80]

(80,

85]

(85,

90]

% GC

Ave

rag

e co

vera

ge

gap + arms

gap

extension arm

ligation arm

99% Concordance Between Padlock and HapMap

The Editing “Calls” Are Well Correlated

0.01

0.1

1

0.01 0.1 1

G/(A+G), frontal lobe replicate 1

G/(

A+

G),

fro

nta

l lo

be

re

plic

ate

2r = 0.964

Bisulfite-treated genome

• 10k CpG sites tiling the ENCODE regions – 1 CpG site every 3kb region on average

• High specificity– 79 of 80 Sanger reads match correct locations

Bisulfite Padlock Probes (BSP): CpG Methylation

B

strep

B

P

P

B

B

collected in a tube

PCR

λ exonuclease

shearing, end polishing

adapter ligation

hybridization in closed-tube solution

denaturing, PCR

Li et al., unpublished

Methods in Comparison

Padlock Array-based hyb Biotin-coupled hyb

Upfront probe cost

(10-20% of exome)$12,000 per 55k 100mers

$600 per 385k 70mers

$500 per 244k 60mers

Probes amplifiable? Yes No Yes

Reaction phase Solution, 10-20 μl Surface, 200 μl Solution, 10-20 μl

Enzymes in hyb? Yes No No

gDNA required ~0.5-1 μg 20 μg (WGA) ~0.5-1 μg

Efficiency (->accuracy) 1% N/A (<0.1%?) ~10%?

Uniformity 100-fold range 10-fold range 10-fold range?

Specificity ~100% on target30-80% on or near target

~55% on or near target

Two Tech Replicates Are Well Correlated

Ranked target sites

Num

ber

of r

eads

per

site

Counts, replicate 1C

ount

s, r

eplic

ate

2

Uniformity Correlation of counts