45
Tiling Array and ChIP-chip Tiling Array and ChIP chip

Tiling Array and ChIPTiling Array and ChIP-chip

  • Upload
    others

  • View
    15

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Tiling Array and ChIPTiling Array and ChIP-chip

Tiling Array and ChIP-chipTiling Array and ChIP chip

Page 2: Tiling Array and ChIPTiling Array and ChIP-chip

Gene Regulation

Expression No Expression SpatiallyExpression No Expression

X

Y

Spatially

X

Z

Temporally

A

B

A

B

A

B

Y

Z

C C CX

Y

Z

X

Y

Z

Page 3: Tiling Array and ChIPTiling Array and ChIP-chip

Transcription Factors and Their Binding Sites

TF1 TF2Transcription factors (TF): TF1 TF2Transcription factors (TF):

Transcription factor binding sites (TFBS): CCACCCAC, TAATAAAAT

TF1TF1TF2

TF1TTATGTAACCTGCACTTACTACCACCCACAACATAATAAAATCTAAACCACTGAATGAAATACAAAATCTATGTATGA...

TF2TTATGTAACCTGCACTTACTACCACCCACAACATAATAAAATCTAAACCACTGAATGAAATACAAAATCTATGTATGA...

Page 4: Tiling Array and ChIPTiling Array and ChIP-chip

Transcription factor binding motif

GTATGTACTTACTATGGGTGGTCAACAAATCTATGTATGA

TAACATGTGACTCCTATAACCTCTTTGGGTGGTACATGAA

TF

TF

TF

123456789

TGGGTGGTC

TGGGTGGTA

1 2 3 4 5 6 7 8 9

A 0 0 1 0 1 0 0 0 1

CTGGGAGGTCCTCGGTTCAGAGTCACAGAGCAGATAATCA

TTAGAGGCACAATTGCTTGGGTGGTGCACAAAAAAACAAG

TF

TF

TF

TGGGTGGTA

TGGGAGGTC

TGGGTGGTG

TGAGTGGTC

C 0 0 0 0 0 0 0 0 4

G 0 6 5 6 0 6 6 0 1

T 6 0 0 0 5 0 0 6 0AACAGCCTTGGATTAGCTGCTGGGGGGGTGAGTGGTCCAC

ATCAGAATGGGTGGTCCATATATCCCAAAGAAGAGGGTAGTF

TGAGTGGTC

TGGGTGGTC

Transcription Factor Binding Sites (TFBS)

1 2 3 4 5 6 7 8 9

A 0.00 0.00 0.17 0.00 0.17 0.00 0.00 0.00 0.17

a sc pt o acto d g S tes ( S)

C 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.66

G 0.00 1.00 0.83 1.00 0.00 1.00 1.00 0.00 0.17

T 1.00 0.00 0.00 0.00 0.83 0.00 0.00 1.00 0.00

Motif

Page 5: Tiling Array and ChIPTiling Array and ChIP-chip

Finding motifs from co-regulated genes

(R th t l 1998 H h t l 2000 t )(Roth et al., 1998; Hughes et al., 2000; etc.)

GTATGTACTTACTATGGGTGGTCAACAAATCTATGTATGAGTATGTACTTACTATGGGTGGTCAACAAATCTATGTATGA G 1 GTATGTACTTACTATGGGTGGTCAACAAATCTATGTATGA

CTGGGAGGTCCTCGGTTCAGAGTCACAGAGCAGATAATCA

TAACATGTGACTCCTATAACCTCTTTGGGTGGTACATGAA

GTATGTACTTACTATGGGTGGTCAACAAATCTATGTATGA

CTGGGAGGTCCTCGGTTCAGAGTCACAGAGCAGATAATCA

TAACATGTGACTCCTATAACCTCTTTGGGTGGTACATGAA

Gene1

Gene2

Gene3

Gene 1Gene 2Gene 3

Condition1 Condition2

Gene N

Page 6: Tiling Array and ChIPTiling Array and ChIP-chip

Motif discovery is difficult in mammalian genomes due to a low signal to noise ratiogenomes due to a low signal-to-noise ratio

Gene1100~1000 bp

G 2100~1000 bp

t Gene2

Gene3100~1000 bp

yeast

10k~1000k bpGene1

Gene210k~1000k bp

10k~1000k bp

human

Gene3p

Page 7: Tiling Array and ChIPTiling Array and ChIP-chip

ChIP-chipp

Page 8: Tiling Array and ChIPTiling Array and ChIP-chip

Genome Tiling Arraysg y

• Affymetrix genome tiling microarraysy g g y– Tile the genome non-repeat regions– Chr21/22 tiling (earlier version): 1 million probe pairsChr21/22 tiling (earlier version): 1 million probe pairs

(PM & MM) at 35 bp resolution on 3 arrays– Whole genome: 42 million PM probes on 7 arraysg p y

PM CGACATTGATTCAAGACTACATACAPM CGACATTGATTCAAGACTACATACAMM CGACATTGATTCTAGACTACATACA

Probes

ChromosomeChromosome

By Xiaole Shirley Liu at Harvard

Page 9: Tiling Array and ChIPTiling Array and ChIP-chip

Genome Tiling Arraysg y

# Arrays # Probes # Total Probe Probehuman genome

# Probes / Array

# Total Probes

Probe Length

Probe Resolution Price

Affymetrix 7 6M 42.0M 25mer 35 bp $2,000

Ni bl 38 390K 14 8M 50 110 b $30 000Nimblegen 38 390K 14.8M 50mer 110 bp $30,000

300 bp in

Agilent 21 244K 5.1M 60mer

300 bp in genes;

500 bp in $11,000

intergenic

By Xiaole Shirley Liu at Harvard

Page 10: Tiling Array and ChIPTiling Array and ChIP-chip

ChIP-chip Array Hybridizationp y y

• Map high intensity probes back to the genomep g y p g• Locate TF binding location

ChIP-DNA

Noise

Probes

ChromosomeChromosome

By Xiaole Shirley Liu at Harvard

Page 11: Tiling Array and ChIPTiling Array and ChIP-chip

Identify ChIP-enriched Regiony g

• Controls: sonicated genomic Input DNAg p• Often 3 ChIP, 3 Ctrl replicates are needed

ChIPChIP

Ctrl

By Xiaole Shirley Liu at Harvard

Page 12: Tiling Array and ChIPTiling Array and ChIP-chip

Other Applicationspp

• Transcription factor binding (ChIP-chip)

• Chromatin modifications

• DNA methylation

• Transcriptome

• Nucleosome positioning

• Copy number variations

Page 13: Tiling Array and ChIPTiling Array and ChIP-chip

Back to ChIP-chipp

Page 14: Tiling Array and ChIPTiling Array and ChIP-chip

Data Analysisy

P i &Preprocessing & Normalization

Peak DetectionPeak Detection

DownstreamDownstream Analyses

Page 15: Tiling Array and ChIPTiling Array and ChIP-chip

Raw dataw d

ChIP Control

Page 16: Tiling Array and ChIPTiling Array and ChIP-chip

Mann-Whitney U-testfor ChIP region Detectionfor ChIP-region Detection

• Affy TAS, Cawley et al (Cell 2004):Affy TAS, Cawley et al (Cell 2004): – Each probe: rank probes (either PM-MM or

PM) within [ 500bp +500bp] windowPM) within [-500bp, +500bp] window– Check whether sum of ChIP ranks is much

llsmaller

By Xiaole Shirley Liu at Harvard

Page 17: Tiling Array and ChIPTiling Array and ChIP-chip

TileMap (Ji d W Bi i f ti 2005)(Ji and Wong, Bioinformatics 2005)

STEP 1:Compute a test statistic for each probe toCompute a test statistic for each probe to

summarize probe level information

STEP 2:C bi b l l i i fCombine probe level test statistics of

neighboring probes to help infer binding regions

Page 18: Tiling Array and ChIPTiling Array and ChIP-chip

Probe level test statistic: empirical Bayes approach

Probe 1 2 3 I

22s 2

3s 2Is2

1s …

Probe

Sample Variance (df)

1 2 3 … I

Mean Sum of Squares

∑ −=i i ssS 222 )]([2s

IsIB 1)(212ˆ 22 −+

−=Shrinkage Factor

Ss

dfIdfB )(

22 ++

+=Shrinkage Factor

222 ˆ)ˆ1(ˆ sBsB ii +−=σ

Variance Shrinkage Estimator

21σ̂

22σ̂ 2

3σ̂ 2ˆ Iσ…Variance EstimatesA modified t-statistic

i

iii

KK

xxtσ̂11

~

21

21

+

−=

1~t 2

~t 3~t It

~…Probe level test statistics 1t 2t 3t ItProbe level test statistics

Page 19: Tiling Array and ChIPTiling Array and ChIP-chip

Combining neighboring probesg g g p

TileMap (MA)TileMap (MA)

1. Compute the probe level test statistic t for each probe;

2. Compute a moving average statistic to measure enrichment;

3. Estimate FDR.

TileMap (HMM)

1. Compute the probe level test statistic t for each probe;

2. Estimate the distribution of t under H0 and H1;

3. Model t by a Hidden Markov Model, and decode the HMM.

Page 20: Tiling Array and ChIPTiling Array and ChIP-chip

Shrinking variance increases statistical power

Moving Average

t-statistic, variance shrinking

Moving Average

t-statistic, canonical

g

Mean(X1)-Mean(X2)

,

Mean(X1) Mean(X2)

Page 21: Tiling Array and ChIPTiling Array and ChIP-chip

Peak 2 (180bp) transgenics( p) g

N l t b i T iNeural tube expression Transgenics

Page 22: Tiling Array and ChIPTiling Array and ChIP-chip

Comparisons between TileMap and previous methodsmethods

cMyc ChIP-chip Data: 6 IP + 6 CT1 + 6 CT2cMyc ChIP chip Data: 6 IP + 6 CT1 + 6 CT2

Gold Standard: Using GTRANS and Keles’ method to analyze all 18 arrays

Test data: 4 arrays, 2 IP vs 2 CT1 (s2r2)

TileMap-HMM (Ji & Wong, 2005)

GTRANS or TAS (Kampa et al., 2004)

1. Set a window;

2. Perform a Wilcoxon signed rank test for . e o a W co o s g ed a test oeach window.

Keles et al. (2004)

1. Compute a t-statistic t for each probe p p(no shrinking, two sample only);

2. Rank probes by a moving average.

Page 23: Tiling Array and ChIPTiling Array and ChIP-chip

Shrinking variance saves money

Using non-shrinking method (Keles’ method) to analyze all probesUsing non shrinking method (Keles method) to analyze all probes

Using shrinking method to analyze half of the probes, i.e., reduce information by half

Page 24: Tiling Array and ChIPTiling Array and ChIP-chip

MAT(J h W E t l PNAS 2006)(Johnson W.E. et al. PNAS, 2006)

• Model-based Analysis of Tiling arrays for ChIP-chip

• Goal: – Find ChIP-regions without replicates

Find ChIP region without controls– Find ChIP-region without controls– Find ChIP-regions without MM probes– Can analyze data array by arrayy y y y

By Xiaole Shirley Liu at Harvard

Page 25: Tiling Array and ChIPTiling Array and ChIP-chip

MAT

• Estimate probe behavior by checking other probes with similar sequence on the same array

• Probe sequence plays a big role in signal value

• Most of the probes inMost of the probes in ChIP-chip measures

ifinon-specific hybridization

By Xiaole Shirley Liu at Harvard

Page 26: Tiling Array and ChIPTiling Array and ChIP-chip

Probe Behavior Model

Baseline on number of Ts

A,C,G at each position

A,C,G,T Count Square

25mer Copy NumberA,C,G at each position of the 25mer

25mer Copy Number along the Genome

By Xiaole Shirley Liu at Harvard

Page 27: Tiling Array and ChIPTiling Array and ChIP-chip

Probe Standardization

• Fit the probe model array by arrayFit the probe model array by array• Divide array probes to bins (3k probes/bin)

B k d bt ti d t d di ti• Background-subtraction and standardization (normalization) on a single array;

Model predicted probe intensity

Observed probe intensity

iii s

mPMLogt ˆ)( −=

Observed probebinaffinityis Observed probe

variance within each bin

By Xiaole Shirley Liu at Harvard

Page 28: Tiling Array and ChIPTiling Array and ChIP-chip

Eliminate Normalization

• Probe log(PM) values before and after g( )standardization

• If normalize before model fitting– Predicted same ChIP-regions, although less confident

By Xiaole Shirley Liu at Harvard

Page 29: Tiling Array and ChIPTiling Array and ChIP-chip

ChIP-region Detectiong

• Window-based MATscore– ChIP without Ctrl

iiTMiMAT )'()(– TM: trimmed mean

ChIPnregioninstTMregionMAT )'()( =

TM: trimmed mean– Multiple ChIP with multiple Ctrl

ChIPInput

nInputinstTMChIPinstTMregionMATσ

)'()'()( −=

– More probes, higher t values in ChIP, less variance (fluctuation) more confident(fluctuation) more confident

By Xiaole Shirley Liu at Harvard

Page 30: Tiling Array and ChIPTiling Array and ChIP-chip

Raw probe values at two spike-in regions with concentration 2X2X 2X

ChIP_1 Log(PM)

Input 1 Log(PM)Input_1 Log(PM)

Sequence-based probe behavior standardizationq pChIP_1 t-value

I t 1 t lInput_1 t-value

Window-based neighboring probe combination for ChIP-region detectionWindow based neighboring probe combination for ChIP region detectionChIP_1 MATscore

ChIP 1/Input 1ChIP_1/Input_1MATscore

3 Reps ChIP/InputMATscore

By Xiaole Shirley Liu at Harvard

Page 31: Tiling Array and ChIPTiling Array and ChIP-chip

Statistical Significance of Hitsg

Background

<1% enriched

Enriched DNA

• P-value and FDR cutoff:– P-value from MATscore distribution– Estimate negative peaks under the same P value cutoff– Regional FDR = #negative_peaks / #positive_peaks

By Xiaole Shirley Liu at Harvard

Page 32: Tiling Array and ChIPTiling Array and ChIP-chip

MAT summary y

• Open source python p pyhttp://chip.dfci.harvard.edu/~wli/MAT/

• Runs faster than array scannerRuns faster than array scanner• Can work with single ChIP, multiple ChIP, and

multiple ChIP with controls with increasingmultiple ChIP with controls with increasing accuracy

U i l ChIP t t t t tib d– Use single ChIP on promoter arrays to test antibody and protocol before going whole genome

Can identify individual failed samples• Can identify individual failed samples

By Xiaole Shirley Liu at Harvard

Page 33: Tiling Array and ChIPTiling Array and ChIP-chip

Benchmark for ChIP-chip Target Detection(J h D S t l G R h 2008)(Johnson D.S. et al. Genome Research, 2008)

• ENCODE Spike in experiment:• ENCODE Spike-in experiment: both amplified and un-amplified

ChIP96 ENCODE clones,

2 4 8 256X i h t +

Input

t t l i DNA2,4,8,...,256X enrichment + total chromatin DNA

total genomic DNA

• Blind test: Samples hybridized to different tiling arraysSamples hybridized to different tiling arrays, predictions made before the key was released

Page 34: Tiling Array and ChIPTiling Array and ChIP-chip

Comparison of platformsp p

Page 35: Tiling Array and ChIPTiling Array and ChIP-chip

Comparison of algorithmsp g

Combined Johnson D.S. et al. Genome Research 2008 with Ji H. et al. Nature Biotechnology 2008

Page 36: Tiling Array and ChIPTiling Array and ChIP-chip

Residual Probe Effects after MAT

Page 37: Tiling Array and ChIPTiling Array and ChIP-chip

TileProbe (Judy & Ji, Bioinformatics, 2009)

Page 38: Tiling Array and ChIPTiling Array and ChIP-chip

TileProbe vs. MAT (GLI3)( )

1IP 0CT 3IP 0CT

Page 39: Tiling Array and ChIPTiling Array and ChIP-chip

TileProbe vs. MAT (Oct4)( )

1IP 0CT 3IP 0CT

Page 40: Tiling Array and ChIPTiling Array and ChIP-chip

TileProbe vs. MAT (NRSF)( )

1IP 0CT 2IP 0CT

Page 41: Tiling Array and ChIPTiling Array and ChIP-chip

Motif enrichment

Page 42: Tiling Array and ChIPTiling Array and ChIP-chip

MBR: Microarray Blob Removery

By Xiaole Shirley Liu at Harvard

Page 43: Tiling Array and ChIPTiling Array and ChIP-chip

xMAN: eXtreme MApping of oligoNucleotidesoligoNucleotides

• http://chip dfci harvard edu/~wli/xMAN• http://chip.dfci.harvard.edu/~wli/xMAN• xMAN maps ~42 M Affymetrix tiling probes to the newest

human genome assembly in less than 6 CPU hourshuman genome assembly in less than 6 CPU hours– BLAST needs 20 CPU years; BLAT needs 55 CPU days– Probe TCCCAGCACTTTGGGAGGCTGAGGC maps to 50,660 p ,

times in the genome

• Can map long oligos, and paired tag high throughput sequencing fragments

• Store the copy number information of every probe• mXAN filters tiling array probes to ensure one unique

probe measurement per 1 kb, improves peak detection

By Xiaole Shirley Liu at Harvard

Page 44: Tiling Array and ChIPTiling Array and ChIP-chip

CisGenome(Ji H t l N t Bi t h l 2008)(Ji H. et al. Nature Biotechnol., 2008)

Graphic User Interface

CisGenome Browser

Core Data Analysis

Programsg

Page 45: Tiling Array and ChIPTiling Array and ChIP-chip

CEAS: Cis-regulatory Element Annotation SystemSystem

• Data Analysis Button for Biologists

http://ceas.cbi.pku.edu.cnBy Xiaole Shirley Liu at Harvard