Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Detection and inference of differentially methylated regions from bisulfite
sequencing
Keegan Korthauer
@keegankorthauer
Bioconductor Meeting 2017
Differential methylation
Differential methylation commonly studied in – Cancer – Developmental stages / Aging – Tissue types
Control
Disease
Unmethylated CpG
Methylated CpG
* * * *
Differentially Methylated Region (DMR)
Whole Genome Bisulfite Sequencing
Treat DNA with bisulfite before sequencing – Methylated C -> unaffected – Unmethylated C -> appear as T
GAGCGATGGATAGCG CGAGTGATGGATAGC GTTACGAGCGACGG TGGTTACGAGCGATG
...TCTCGGTTACGAGCGACGGATAGCG...
Bisulfite sequencing reads Reference genome
Whole Genome Bisulfite Sequencing
Treat DNA with bisulfite before sequencing – Methylated C -> unaffected – Unmethylated C -> appear as T
GAGCGATGGATAGCG CGAGTGATGGATAGC GTTACGAGCGACGG TGGTTACGAGCGATG
...TCTCGGTTACGAGCGACGGATAGCG...
Bisulfite sequencing reads Reference genome
3/3 1
3/4 0.75
1/4 0.25
0/1 0
Methylation Counts and Proportions
Methods for DMR detection
Challenges – Have to handle small sample sizes ! – Accommodate known sources of variability – Detect region boundaries
Two main strategies – Detect significant CpGs, then group together neighbors to
form regions (DSS, BSmooth) – Targeted regions / sliding windows specified in advance
(BiSeq, MOABS)
Genomicloca+on
Methyla+o
nDiffe
rence
0
Danger of grouping significant CpGs
Genomicloca+on
Methyla+o
nDiffe
rence
0
TruePosi+veFalsePosi+veTrueNega+ve
Danger of grouping significant CpGs False Discovery Rate = # False Discoveries (FDR) Total # Discoveries = 2/8 = 0.25
Genomicloca+on
Methyla+o
nDiffe
rence
0
Danger of grouping significant CpGs
TruePosi+veFalsePosi+veTrueNega+ve
False Discovery Rate = # False Discoveries (FDR) Total # Discoveries = 1/2 = 0.5
Met
hyla
tion
0.2
0.5
0.8
●
●
Left VentricleSmall Intestine
chr5: 75,654,594 − 75,655,232 (width = 639), FDR: 0.0054, Stat: 23.4
Island Shore Shelf Open SeaCpGs
Exons
DMRs come in all shapes and sizes M
ethy
latio
n
0.2
0.5
0.8
●
●
Left VentricleSmall Intestine
chr16: 54,288,213 − 54,292,817 (width = 4,605), FDR: 0.0054, Stat: −23.678
Island Shore Shelf Open SeaCpGs
IRX3Exons
-> Detect region boundaries from the data
Our approach: dmrseq 1. Detect regions by scanning the genome for candidate
regions
2. Region-level summary scores compared to a null to evaluate significance
Diffe
rence
0Diffe
rence
0
? ?
Our approach: dmrseq 1. Detect regions by scanning the genome for candidate
regions
2. Region-level summary scores compared to a null to evaluate significance
Diffe
rence
0Diffe
rence
0
? ?
Our approach: dmrseq 1. Detect regions by scanning the genome for candidate
regions
2. Region-level summary scores compared to a null to evaluate significance
Diffe
rence
0Diffe
rence
0
? ?
Region level Summary Scores
– Biological variability
among samples
– Correlation among nearby CpGs
– Higher variability in CpGs with lower coverage
Generalized Least Squares (GLS) regression
– Nested autoregressive correlated error structure
– Coverage-weighted variance
arcsin(2π ijr −1) = αlr1[i=1]l∑ +βrX j +εijr
Var(εijr ) =
Summary statistic that is approximately exchangeable across the genome so we can
generate a pooled null
dmrseq accurately controls FDR
0.0
0.1
0.2
0.3
0.4
0.0 0.1 0.2 0.3 0.4Specified FDR level
Obse
rved
FDR
leve
l
(A) dmrseq
0.0
0.1
0.2
0.3
0.4
0.0 0.1 0.2 0.3 0.4Specified FDR level
(B) metilene
SimulationD2
D3
dmrseq has high sensitivity
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00FDR
Powe
r
(A) Simulation D2
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00FDR
(B) Simulation D3
MethodBSmooth
dmrseq
DSS
metilene
DMRs enriched for associations with differential expression
(A) Roadmap Tissue Comparisons
1
2
3
4
0.01 0.10 0.25 0.50 0.75 1.00FDR threshold (square root scaled)
log2
odd
s
Right Ventricle vs Small Intestine
1
2
3
4
0.01 0.10 0.25 0.50 0.75 1.00FDR threshold (square root scaled)
Right Ventricle vs Sigmoid Colon
1
2
3
4
0.01 0.10 0.25 0.50 0.75 1.00FDR threshold (square root scaled)
Left Ventricle vs Small Intestine
1
2
3
4
0.01 0.10 0.25 0.50 0.75 1.00FDR threshold (square root scaled)
log2
odd
s
Left Ventricle vs Sigmoid Colon
1
2
3
4
0.01 0.10 0.25 0.50 0.75 1.00FDR threshold (square root scaled)
Sigmoid Colon vs Small Intestine
(B) Murine Leukemia Models
0
1
2
3
0.01 0.10 0.25 0.50 0.75 1.00FDR threshold (square root scaled)
log2
odd
s
AML vs Control
0
1
2
3
0.01 0.10 0.25 0.50 0.75 1.00FDR threshold (square root scaled)
ALL vs Control
0
1
2
3
0.01 0.10 0.25 0.50 0.75 1.00FDR threshold (square root scaled)
AML vs ALL
Method BSmooth dmrseq DSS metilene
(A) Roadmap Tissue Comparisons
1
2
3
4
0.01 0.10 0.25 0.50 0.75 1.00FDR threshold (square root scaled)
log2
odd
s
Right Ventricle vs Small Intestine
1
2
3
4
0.01 0.10 0.25 0.50 0.75 1.00FDR threshold (square root scaled)
Right Ventricle vs Sigmoid Colon
1
2
3
4
0.01 0.10 0.25 0.50 0.75 1.00FDR threshold (square root scaled)
Left Ventricle vs Small Intestine
1
2
3
4
0.01 0.10 0.25 0.50 0.75 1.00FDR threshold (square root scaled)
log2
odd
s
Left Ventricle vs Sigmoid Colon
1
2
3
4
0.01 0.10 0.25 0.50 0.75 1.00FDR threshold (square root scaled)
Sigmoid Colon vs Small Intestine
(B) Murine Leukemia Models
0
1
2
3
0.01 0.10 0.25 0.50 0.75 1.00FDR threshold (square root scaled)
log2
odd
s
AML vs Control
0
1
2
3
0.01 0.10 0.25 0.50 0.75 1.00FDR threshold (square root scaled)
ALL vs Control
0
1
2
3
0.01 0.10 0.25 0.50 0.75 1.00FDR threshold (square root scaled)
AML vs ALL
Method BSmooth dmrseq DSS metilene
(A) Roadmap Tissue Comparisons
1
2
3
4
0.01 0.10 0.25 0.50 0.75 1.00FDR threshold (square root scaled)
log2
odd
s
Right Ventricle vs Small Intestine
1
2
3
4
0.01 0.10 0.25 0.50 0.75 1.00FDR threshold (square root scaled)
Right Ventricle vs Sigmoid Colon
1
2
3
4
0.01 0.10 0.25 0.50 0.75 1.00FDR threshold (square root scaled)
Left Ventricle vs Small Intestine
1
2
3
4
0.01 0.10 0.25 0.50 0.75 1.00FDR threshold (square root scaled)
log2
odd
s
Left Ventricle vs Sigmoid Colon
1
2
3
4
0.01 0.10 0.25 0.50 0.75 1.00FDR threshold (square root scaled)
Sigmoid Colon vs Small Intestine
(B) Murine Leukemia Models
0
1
2
3
0.01 0.10 0.25 0.50 0.75 1.00FDR threshold (square root scaled)
log2
odd
s
AML vs Control
0
1
2
3
0.01 0.10 0.25 0.50 0.75 1.00FDR threshold (square root scaled)
ALL vs Control
0
1
2
3
0.01 0.10 0.25 0.50 0.75 1.00FDR threshold (square root scaled)
AML vs ALL
Method BSmooth dmrseq DSS metilene
dmrseq Summary
- dmrseq identifies and prioritizes DMRs from bisulfite sequencing experiments
- Computes region summary statistics that account for known sources of variability across the genome
- Achieves accurate False Discovery Rate control by generating a null distribution that pools information across the genome
Rafa Lab Rafael Irizarry Chinmay Shukla
Collaborators Sutirtha Chakraborty Yuval Benjamini
Learn More Slides goo.gl/MwQz5f dmrseq available on GitHub https://github.com/kdkorthauer/dmrseq
Acknowledgements
Contact [email protected] @keegsdur