Introduction to epigenetics: chromatin modifications, DNA methylation and the CpG Island landscape...

Preview:

Citation preview

Introduction to epigenetics: chromatin modifications, DNA methylation and the CpG Island landscape (part 2)

Héctor Corrada BravoCMSC858P Spring 2012

(many slides courtesy of Rafael Irizarry)

How do we measure DNA methylation?

Microarray Data

One question…

• Where do we measure?

• At least 7 arrays are needed to measure entire genome

• CpG are depleated

• Remaining CpGs cluster

CpG Islands

But variation seen outside

McRBC

No Methylation

Cuts at AmCG or GmCG Input

McRBC

Methylation

McRBC after GEL

Methylation

McRBC after GEL

Methylation

Now unmethylated

No Methylation

McRBC after Gel

No Methylation

Gene Expression Normalization does not work well here

We use control probes

There are also waves

Smoothing

McRBC on tiling two channel array

We smooth

Proportion of neighboring CpG also methylated/not methylated

True signal (simulated)

Observed data

Observed data and true signal

What is methylated (above 50%)?

Naïve approach

Many false positives (FP)

Smooth

No FP, but one false negative

Smooth less? No FN, lots of FP

We prefer this!

CHARMDMR for three tissues (five replicates)

Irizarry et al, Nature Genetics 2009

Some findings

[Irizarry et al., 2009, Nat. Genetics]

Tissue easily distinguished

Cancer DMR

Many Regions like thisNote: hypo and hyper methylation

Both hyper and hypo methylated

Cancer and Tissue DMRs coincide

DMR enriched in Shores

Still affects expression

T-DMRs

Still affects expression

C-DMRs

USING SEQUENCING (BS-SEQ)

TTCGATTACGA

AAGCTAATGCT

CH3

CH3

TTCGATTACGA

AAGCTAATGCT

CH3

CH3

Liver Brain

TTCGATTACGA

AAGCTAATGCT

CH3

CH3

TTCGATTACGA

AAGCTAATGCT

CH3

CH3

TTCGATTACGA

AAGCTAATGCT

CH3

CH3

TTCGATTACGA

AAGCTAATGCT

CH3

TTCGATTACGA

AAGCTAATGCT

CH3

CH3TTCGATTACGA

AAGCTAATGCT

CH3

CH3

TTCGATTACGA

AAGCTAATGCT

CH3

CH3

85% Methylationchr3:44,031,616-44,031,626

Bisulfite Treatment

Bisulfite Treatment

GGGGAGCAGCATGGAGGAGCCTTCGGCTGACT

GGGGAGCAGTATGGAGGAGTTTTCGGTTGATT

BS-seq

GTCGTAGTATTTGTCT GTCGTAGTATTTGTNN TGTCGTAGTATCTGTC TATGTCGTAGTATTTG TATATCGTAGTATTTT TATATCGTAGTATTTG NATATCGTAGTATNTG TTTTATATCGCAGTAT ATATTTTATGTCGTA ATATTTTATCTCGTA ATATTTTATGTCGTA GA-TATTTTATGTCGTGATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATCCTATTATTTATCGCACCTAC

GTTCAATATT

Coverage: 13Methylation Evidence: 13Methylation Percentage: 100%

BS-seq

GTCGTAGTATTTGTCT GTCGTAGTATTTGTNN TGTCGTAGTATCTGTC TATGTCGTAGTATTTG TATATTGTAGTATTTT TATATCGTAGTATTTG NATATTGTAGTATNTG TTTTATATTGCAGTAT ATATTTTATGTCGTA ATATTTTATCTTGTA ATATTTTATGTCGTA GA-TATTTTATGTCGTGATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATCCTATTATTTATCGCACCTAC

GTTCAATATT

Coverage: 13Methylation Evidence: 9Methylation Percentage: 69%

BS-seq

GTCGTAGTATTTGTCT GTCGTAGTATTTGTNN TGTTGTAGTATCTGTC TATGTTGTAGTATTTG TATATTGTAGTATTTT TATATTGTAGTATTTG NATATTGTAGTATNTG TTTTATATTGCAGTAT ATATTTTATGTCGTA ATATTTTATCTTGTA ATATTTTATGTTGTA GA-TATTTTATGTCGTGATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATCCTATTATTTATCGCACCTAC

GTTCAATATT

Coverage: 13Methylation Evidence: 4Methylation Percentage: 31%

BS-seq

• Alignment is much trickier:– Naïve strategy: do nothing, hope not many CpG in a

single read– Smarter strategy: “bisulfite convert” reference: turn all

Cs to Ts• Also needs to be done on reverse complement reference and

reads

– Smartest strategy: be unbiased and try all combinations of methylated/un-methylated CpGs in each read

• Computationally expensive (see Hansen et al, 2011, for a strategy)

BS-seq

• There are similarities to SNP calling (we’ll see this in a couple of weeks)

• EXCEPT: we want to measure percentages– Use a binomial model to estimate p, percentage of

methylation– Allow for sequencing errors, coverage differences,

etc.

Measuring DNA Methylation

• Estimating percentages• Use “local-likelihood”

method– Based on loess

(Plot courtesy of Kasper Hansen)

BS-seq

Lister et al. 2009, Nature

Gene Expression Regulation: DNA methylation in promoter regions

Lister et al. 2009, Nature

DNA methylation patterns within genomic regions

Lister et al. 2009

Putting it together

What were we after?

• The epigenetic progenitor origin of human cancer• [Feinberg, et al., Nature Reviews Genetics, 2006]• Stochastic epigenetic variation as driving force of

disease• [Feinberg & Irizarry, PNAS, 2009]• Phenotypic variation, perhaps epigenetically mediated,

increases disease susceptibility• Increased epigenetic and gene expression variability of

specific genes/regions is a defining characteristic of cancer

What did we do?

• Custom Illumina methylation microarray• Confirmed increased epigenetic variability in

specific regions across five cancer types

What did we do?

• Custom Illumina methylation microarray• Confirmed increased epigenetic variability in

specific regions across five cancer types

What did we do?• Custom Illumina methylation microarray

• Confirmed increased epigenetic variability in specific regions across five cancer

types

• Confirmed same sites are involved in tissue differentiation

What did we do?• Custom Illumina methylation microarray

• Whole genome sequencing of bisulfite treated DNA– Found large blocks of hypo-methylation (sometimes Mbps long) in

colon cancer

What did we do?• Custom Illumina methylation microarray

• Whole genome sequencing of bisulfite treated DNA– Found large blocks of hypo-methylation (sometimes Mbps long) in

colon cancer– These regions coincide with hyper-variable regions across cancer types

What did we do?• Custom Illumina methylation microarray• Whole genome sequencing of bisulfite treated DNA• Gene Expression Analysis

Gene Expression Data

Gene Expression Data

When using multiple microarray experiments, proper normalization is key[McCall, et al., Biostatistics 2010]

Normalization is key

• fRMA: a single-chip normalization procedure• GNUSE: a single-chip quality metric• Barcode: a single-chip common-scale

measurement

What did we do?• Custom Illumina methylation microarray• Whole genome sequencing of bisulfite treated DNA• Gene Expression Analysis

– Genes with hyper-variable gene expression in colon cancer are enriched in hypo-methylation blocks

[Corrada Bravo, et al., under review]

What are we doing next?• Custom Illumina methylation microarray• Whole genome sequencing of bisulfite treated DNA• Gene Expression Analysis

– Genes with hyper-variable gene expression in colon cancer are enriched in hypo-methylation blocks

Bigger gene expression study

• 7,741 HGU133plus2 samples• 598 normal tissue samples, 4,886 tumor

samples• 176 different tissue types• 175 different GEO studies

Bigger gene expression study

[Corrada Bravo, et al., under review]

What are we doing next?• Custom Illumina methylation microarray• Whole genome sequencing of bisulfite treated DNA• Gene Expression Analysis

– Genes with hyper-variable gene expression in colon cancer are enriched in hypo-methylation blocks

– Tissue-specific genes have hyper-variable gene expression across cancer types

[Corrada Bravo, et al., under review]

[Corrada Bravo, et al., under review]

[Corrada Bravo, et al., under review]

Recommended