22
Applications of Applications of scan statistics scan statistics in molecular in molecular biology and biology and neuroscience neuroscience by Chan Hock Peng by Chan Hock Peng Dept of Statistics and Dept of Statistics and Applied Probabilty Applied Probabilty

Applications of scan statistics in molecular biology and neuroscience

  • Upload
    duard

  • View
    25

  • Download
    0

Embed Size (px)

DESCRIPTION

Applications of scan statistics in molecular biology and neuroscience. by Chan Hock Peng Dept of Statistics and Applied Probabilty. Outline. 1. General introduction 2. Applications in molecular biology (weighted scan statistics) 3. Tail probability computations - PowerPoint PPT Presentation

Citation preview

Page 1: Applications of scan statistics in molecular biology and neuroscience

Applications of Applications of scan statistics in scan statistics in

molecular molecular biology and biology and

neuroscienceneuroscienceby Chan Hock Peng by Chan Hock Peng

Dept of Statistics and Dept of Statistics and Applied ProbabiltyApplied Probabilty

Page 2: Applications of scan statistics in molecular biology and neuroscience

OutlineOutline• 1. General introduction• 2. Applications in molecular biology

(weighted scan statistics)• 3. Tail probability computations• 4. Applications in neuroscience

(template matching problem)• 5. Tail probability computations• 6. Extensions and other applications

Page 3: Applications of scan statistics in molecular biology and neuroscience

NotationNotation• : The maximum score in any window of

length u.• : The underlying rate of events

occurring under normal circumstances.• n: The length of the interval under

consideration.

uM

Page 4: Applications of scan statistics in molecular biology and neuroscience

Example 1Example 1• (USA Today, 1996) On Feb 22, US Navy

suspended all operations of F-14 jet after third crash in one month.

• The three crashes in a month was seven times expected rate based on 5 year period.

• =3, n=5*365, =1/70.30M

Page 5: Applications of scan statistics in molecular biology and neuroscience

Example 2Example 2• (Home News, 1995) In 10 month period,

11 residents died at a Tennessee State Institution. Number was twice what was expected.

• Judge was angry and ordered mental health commissioner to spend one in four weekends at institution.

• =11, n=?, =11/20.10M

Page 6: Applications of scan statistics in molecular biology and neuroscience

Clusters of DAM sites in Clusters of DAM sites in E.Coli DNAE.Coli DNA

• Karlin and Brendel (1992).• DAM site--occurrence of the pattern GATC. • Important in repair and replication of DNA.• =8, n=4.7 million, =1.1/250. • P-value approx. of Naus (1982),

245M

87.0}8{ 245 MP

03.0}10{ 245 MP

Page 7: Applications of scan statistics in molecular biology and neuroscience

Palindromes in DNAPalindromes in DNA• A-T and C-G are complementary

bases.• Complement of CCACGTGG is

GGTGCACC.• CCACGTGG is palindromic pattern

because its complement reads the same as itself backwards.

Page 8: Applications of scan statistics in molecular biology and neuroscience

Palindromic sequences in Palindromic sequences in virusesviruses

• Masse et al. (1992) & Leung et al. (1994).• Palindromic sequences clusters around

origin of replication.• Event occurs if there is palindromic pattern

of length at least 10 base pairs.• HCMV sequence. =10, n=229354,

=0.001. p-value=0.00195.1000M

Page 9: Applications of scan statistics in molecular biology and neuroscience

Extensions to general scoring Extensions to general scoring functions (weighted scan)functions (weighted scan)

• In Chew, Choi and Leung (2005), longer palindromic patterns are given larger weights.

• For example, a pattern of length k can be given score of k/10.

• p-value computations ?

Page 10: Applications of scan statistics in molecular biology and neuroscience

Other applications of Other applications of weighted scanweighted scan

• Rajewsky et al. (2002) & Lifanov et al. (2003).

• Scanning for clusters of transcription factor binding sites.

• Position weighted matrices to score words for similarity to a given motif.

• Siepel et al. (2005). Searching for segments of high evolutionary conservation.

Page 11: Applications of scan statistics in molecular biology and neuroscience

P-value computations for P-value computations for weighted scanweighted scan

• Chan and Zhang (2006).

where• I is a large deviation rate function.• is an overshoot function.• K is the moment generating function of the

scores.

''2

)/()(exp1}{

)/(

Ku

ukeunkMP

ukuI

u

Page 12: Applications of scan statistics in molecular biology and neuroscience
Page 13: Applications of scan statistics in molecular biology and neuroscience
Page 14: Applications of scan statistics in molecular biology and neuroscience

Template matching in Template matching in neuroscienceneuroscience

• Neurons are basic units of information processing in brain.

• Generate small and highly peaked electric potentials known as spikes.

• Pattern of spikes modeled as point or counting process, e.g. Poisson process.

Page 15: Applications of scan statistics in molecular biology and neuroscience

Template patternTemplate pattern• Dave and Margoliash (2000) and Mooney

(2000), the spike patterns of a zebra finch when it is listening to a bird song.

• Each contains the times in which spikes were generated for ith neuron in an interval of time [0,T).

),...,( )()1( dwww

)(iw

Page 16: Applications of scan statistics in molecular biology and neuroscience

Longer spike train patternsLonger spike train patterns• Let be corresponding

spike train patterns when finch is sleeping, observed over a longer period of time [0,a).

• If w matches well with a segment of y, then evidence of bird song replay and hence song learning during sleep.

),...,( )()1( dyyy

Page 17: Applications of scan statistics in molecular biology and neuroscience

Scoring functionScoring function• Consider kernel function f, e.g. let

f(x) = 1 if x < 0.025 ms, f(x)=-0.3 if x> 0.025 ms.

• For the illustration below, consider d=1 and T=0.2ms.

• Let w={.01, .05, .09, .12}.• Let y ={.32, .75, 1.03, 1.15, 1.25 }.

Page 18: Applications of scan statistics in molecular biology and neuroscience

• To check if there is a match between w and the segment of y starting at time t=1, compare w = {.01,.05,.09,.12} against y-1 = {.03,.15}.

• The point .03 provides a score of 1 because there is point in w less than 0.025ms away.

• The point .15 provides a score of -0.3 because nearest point in w is more than 0.025ms away.

• Overall score at time t=1 is 1-0.3=0.7.

Page 19: Applications of scan statistics in molecular biology and neuroscience

Scan statisticsScan statistics• For d>1, add up scores over all neurons

starting at same time t.• Scan statistics is the maximum

possible score over all t in the interval [0,a-T).

• Chi (2004) obtain approx of • Chan & Loh (2005) more precise approx of

was obtained.

TM

}){log( cMP T

}{ cMP T

Page 20: Applications of scan statistics in molecular biology and neuroscience

Assumptions and related Assumptions and related informationinformation

• Each is stationary while are independent Poisson

processes.• Separate formulas when kernel f is

continuous and when it is not continuous.• Number of times a large score c is

exceeded is Poisson random variable.

)(iw)()1( ,..., dyy

Page 21: Applications of scan statistics in molecular biology and neuroscience

Table of approximationsTable of approximations• c MC (s.e.) C & L0.017 0.0387(0.0019) 0.03830.018 0.0237(0.0012) 0.02410.019 0.0158(0.0008) 0.01490.020 0.0095(0.0005) 0.0091 0.021 0.0054(0.0003) 0.00550.022 0.0033(0.0002) 0.0033

Page 22: Applications of scan statistics in molecular biology and neuroscience

Future worksFuture works• Higher dimension Poisson processes

e.g. 2 or 3 dimensional. • Applications in astronomy and

imaging.• Varying window-sizes.