View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Targeted Sequencing of Human Genomes, Transcriptomes, and
Methylomes
Jin Billy LiGeorge Church Lab
Harvard Medical [email protected]
Genetic Loci X Sample Size = Information
# sa
mpl
es
# genetic loci
PCR seqMass-spec
Shotgun seqRNA-seqChIP-seq
SNP array
Target Capturing with Padlock Probes (aka MIPs)
feature 1 feature n
pol
lig …
PCR (or RCA)
…
Porreca et al., Nat Methods 2007
0.0
0.2
0.4
0.6
0.8
1.0
1.2
15
min
s
1 h
ou
r
1 d
ay
1 d
ay
+cy
clin
g
2 d
ays
5 d
ays
10
:1
50
:1
10
0:1
25
0:1 1x
10
x
10
0x
1,0
00
x
10
,00
0x
variable hyb time variable probe:gDNA variable dNTP amount
probe:gDNA = 10:1 2 day hyb time 1 day hyb time
100x dNTP 100x dNTP probe:gDNA = 100:1
Ca
ptu
rin
g e
ffic
ien
cy
(%
)
0
100
200
300
400
500
Fo
ld i
mp
rov
em
en
t
~10,000-fold Improvement Since Nov 2007
1. longer hybridization time; 2. more probes; 3. right [dNTP]
1 2 3
*
* 20-fold improvement already by better probe design and synthesis Li et al., in prepration
0.0
0.2
0.4
0.6
0.8
1.0
1.2
15
min
s
1 h
ou
r
1 d
ay
1 d
ay
+cy
clin
g
2 d
ays
5 d
ays
10
:1
50
:1
10
0:1
25
0:1 1x
10
x
10
0x
1,0
00
x
10
,00
0x
variable hyb time variable probe:gDNA variable dNTP amount
probe:gDNA = 10:1 2 day hyb time 1 day hyb time
100x dNTP 100x dNTP probe:gDNA = 100:1
Ca
ptu
rin
g e
ffic
ien
cy
(%
)
0
100
200
300
400
500
Fo
ld i
mp
rov
em
en
t
~10,000-fold Improvement Since Nov 2007
1. longer hybridization time; 2. more probes; 3. right [dNTP]
1 2 3
*
* 20-fold improvement already by better probe design and synthesis Li et al., in prepration
0.0
0.2
0.4
0.6
0.8
1.0
1.2
15
min
s
1 h
ou
r
1 d
ay
1 d
ay
+cy
clin
g
2 d
ays
5 d
ays
10
:1
50
:1
10
0:1
25
0:1 1x
10
x
10
0x
1,0
00
x
10
,00
0x
variable hyb time variable probe:gDNA variable dNTP amount
probe:gDNA = 10:1 2 day hyb time 1 day hyb time
100x dNTP 100x dNTP probe:gDNA = 100:1
Ca
ptu
rin
g e
ffic
ien
cy
(%
)
0
100
200
300
400
500
Fo
ld i
mp
rov
em
en
t
~10,000-fold Improvement Since Nov 2007
1. longer hybridization time; 2. more probes; 3. right [dNTP]
1 2 3
*
* 20-fold improvement already by better probe design and synthesis Li et al., in prepration
Improved Technology -> Better Performance
95% captured85% within 100-fold range55% within 10-fold range
Sensitivity + Uniformity Correlation
Nov 2007 Nov 2007
Current
Current
Li et al., in prepration
Summary of Improvements
Nov 2007 Current
Specificity ~100% ~100%
Sensitivity/Multiplexity (of 55k)
18% 95%
Uniformity (in 100-fold range)
16% 85%
Correlation of replicates (r)
0.35 0.98
Accuracy (heterozygous calls)
31% 99%
Targeted Capturing of
• Genomes– Exome: PGP etc.– Contiguous regions or gene panels– SNPs– Hypermutable CpG dinucleotides
• Transcriptomes– Alleotyping– RNA editing sites
• Methylomes– CpG methylation
Targeted Capturing of
• Genomes– Exome: PGP etc.– Contiguous regions or gene panels– SNPs– Hypermutable CpG dinucleotides
• Transcriptomes– Alleotyping– RNA editing sites
• Methylomes– CpG methylation
Predicting Putative Editing SitesA in the genome
G in some mRNAs or ESTs
A -> I (G) RNA Editing
• Post-transcriptional A -> I • I is read as G during translation• Only 10 targets are known in human coding
regions
36,000 predicted editing sitesgDNA + 7 tissue cDNAs from an individual
Padlock + Solexa: 239 sites found to be edited
Validation (PCR + Sanger):18 of 20 random sites are obviously
edited
Discovery of 100’s of Novel Editing Sites
with Erez Levanon, in preparation
Genomic DNA
RNA - intestine
RNA - kidney
RNA - diencephalon
RNA - frontal lobe
RNA - corpus callosum
RNA - cerebellum
RNA - adrenal
Example:
VEZF1
Bisulfite Padlock Probes (BSP): CpG Methylation
Bisulfite-treated genome
“3-base”genome
Highspecificityof padlock
Methylation Level Accurately Measured
r = 0.979-0.2
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Methylation level measured by BSP sequencingM
eth
yla
tio
n le
ve
l es
tim
ate
d b
y S
an
ge
r s
eq
ue
nc
ing
BSP-BSP correlation BSP-Sanger correlation
Methylation level measured by BSP sequencing
Met
hyla
tion
leve
l est
imat
ed b
y S
ange
r se
quen
cing
Methylation level, replicate 1
Met
hyla
tion
leve
l, re
plic
ate
2
r = 0.966
Methylation Pattern around GenesGene-Body Methylation
with Madeleine Price Ball, in preparation (poster)
George Church
Padlock technologyKun ZhangJohn AachAbraham RosenbaumJay ShendureGreg PorrecaAnnika Ahlford
RNA editingErez Levanon Jung-Ki Yoon
CpG methylationMadeleine Price Ball
Church Lab
Acknowledgements
AgilentEmily LeproustWilson Woo
SequencingYuan GaoBin XieBob Steen
Superior Quality of Padlock Oligos
100 nt
150 nt
50 nt
0
2
4
6
8
10
12
0 10 20 30 40 50
Number of reads
Pe
rce
nta
ge
of
sit
es
(%
)
before amplification (data)after amplication (data)before amplication (poisson)after amplification (poisson)
PCR (2x)
Solexa sequencing
55k features of up to 200nt
Fra
ctio
n o
f pr
obes
U
From Agilent Oligos to Padlock Probes
amplification and selection
T 18bp Agilent oligo, 136 bp 18bp
PCR
* p
exonuclease
USER + DpnII
DpnII
NN
UA
U
Annealed with DpnII guide oligo
Padlock probe
*
*
Heterozygous Genotypes Correctly Called
Homozygous wild typeHeterozygous variationHomozygous variation
before after
Methods in Comparison
Padlock Array-based hyb
Upfront probe cost
(10-20% of exome)$12,000 per 55k 100mers $600 per 385k 70mers
Probes amplifiable? Yes No
Reaction phase Solution, 10-20 μl Surface, 200 μl
Enzymatic hyb? Yes No
gDNA required ~0.5-1 μg 20 μg (WGA)
Efficiency (->accuracy) 1% N/A (<0.1%?)
Uniformity 100-fold range 10-fold range
Specificity ~100% on target 30-80% on or near target
125
160 159 162155146 139 142
181166
293
166165153
38
156
0
50
100
150
200
250
300
proximal distal proximal distal
extension arm ligation arm
Av
era
ge
co
ve
rag
e
A
C
G
T
Differential Clamping at Ligation Junction
% GC VS Capturing Efficiency
0
50
100
150
200
(10,
15]
(15,
20]
(20,
25]
(25,
30]
(30,
35]
(35,
40]
(40,
45]
(45,
50]
(50,
55]
(55,
60]
(60,
65]
(65,
70]
(70,
75]
(75,
80]
(80,
85]
(85,
90]
% GC
Ave
rag
e co
vera
ge
gap + arms
gap
extension arm
ligation arm
The Editing “Calls” Are Well Correlated
0.01
0.1
1
0.01 0.1 1
G/(A+G), frontal lobe replicate 1
G/(
A+
G),
fro
nta
l lo
be
re
plic
ate
2r = 0.964
Bisulfite-treated genome
• 10k CpG sites tiling the ENCODE regions – 1 CpG site every 3kb region on average
• High specificity– 79 of 80 Sanger reads match correct locations
Bisulfite Padlock Probes (BSP): CpG Methylation
B
strep
B
P
P
B
B
collected in a tube
PCR
λ exonuclease
shearing, end polishing
adapter ligation
hybridization in closed-tube solution
denaturing, PCR
Li et al., unpublished
Methods in Comparison
Padlock Array-based hyb Biotin-coupled hyb
Upfront probe cost
(10-20% of exome)$12,000 per 55k 100mers
$600 per 385k 70mers
$500 per 244k 60mers
Probes amplifiable? Yes No Yes
Reaction phase Solution, 10-20 μl Surface, 200 μl Solution, 10-20 μl
Enzymes in hyb? Yes No No
gDNA required ~0.5-1 μg 20 μg (WGA) ~0.5-1 μg
Efficiency (->accuracy) 1% N/A (<0.1%?) ~10%?
Uniformity 100-fold range 10-fold range 10-fold range?
Specificity ~100% on target30-80% on or near target
~55% on or near target