Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
Sequential Change-Point Detection Basedon Nearest Neighbors
Hao Chen
Department of StatisticsUniversity of California, Davis
February, 2018
*This work is partially supported by NSF-DMS 1513653.
Control chart
How about
monitor multiple streams?
monitor non-Euclidean data?
Control chart
How about
monitor multiple streams?
monitor non-Euclidean data?
Modern data examples
fMRI:
Social networks:
. . .
Outline
1 Graph-based two-sample test
2 Offline change-point detection
3 Sequential (Online) change-point detection
4 An application
Outline
1 Graph-based two-sample test
2 Offline change-point detection
3 Sequential (Online) change-point detection
4 An application
Graph-based two-sample test
Assume we already have a similarity measure on the sample space.
Two samples from the same distribution:
# of NNs from the other sample: 27
Graph-based two-sample test
Assume we already have a similarity measure on the sample space.
Two samples from the same distribution:
# of NNs from the other sample: 27
Graph-based two-sample test
Assume we already have a similarity measure on the sample space.
Two samples from the same distribution:
# of NNs from the other sample: 27
Graph-based two-sample test
Assume we already have a similarity measure on the sample space.
Two samples from the same distribution:
# of NNs from the other sample: 27
Two-sample test based on nearest neighbors
Two samples from different distributions:
# of NNs from the other sample: 8
Two-sample test based on nearest neighbors
Two samples from different distributions:
# of NNs from the other sample: 8
Two-sample test based on nearest neighbors
Two samples from different distributions:
# of NNs from the other sample: 8
Two-sample test based on k-nearest neighbors
y1, . . . ,yn be the pooled observations of two samples.
gi =
{1 if yi belongs to sample 1,0 if yi belongs to sample 2.
a(r)ij =
{1 if yj is the rth nearest neighbor of yi,0 otherwise.
a+ij =∑k
r=1 a(r)ij .
# of nearest neighbors from the other sample:
X =1
2
n∑i=1
n∑j=1
(a+ij + a+ji)I(gi 6= gj)
[Schilling, 1986; Henze, 1988]
Outline
1 Graph-based two-sample test
2 Offline change-point detection
3 Sequential (Online) change-point detection
4 An application
Offline change-point detection based on NNs
ObservationSequence:
Offline change-point detection based on NNs
ObservationSequence:
Offline change-point detection based on NNs
ObservationSequence:
Offline change-point detection based on NNs
ObservationSequence:
Offline change-point detection based on NNs
ObservationSequence:
Offline change-point detection based on NNs
ObservationSequence:
Offline change-point detection based on NNs
ObservationSequence:
Offline change-point detection based on NNs
ObservationSequence:
Offline change-point detection based on NNs
ObservationSequence:
Offline change-point detection based on NNs
ObservationSequence:
Offline change-point detection based on NNs
ObservationSequence:
Offline change-point detection based on NNs
ObservationSequence:
Offline change-point detection based on NNs
ObservationSequence:
Offline change-point detection based on NNs
ObservationSequence:
Offline change-point detection based on NNs
ObservationSequence:
Offline change-point detection based on NNs
ObservationSequence:
Offline change-point detection based on NNs
ObservationSequence:
Offline change-point detection based on NNs
ObservationSequence:
Offline change-point detection based on NNs
ObservationSequence:
Offline change-point detection based on NNs
ObservationSequence:
Offline change-point detection based on NNs
ObservationSequence:
Offline change-point detection based on NNs
ObservationSequence:
Offline change-point detection based on NNs
ObservationSequence:
Offline change-point detection based on NNs
ObservationSequence:
Offline Change-point detection based on NNs
# of NN from the other sample:
Standardize the count
X(t) =1
2
n∑i=1
n∑j=1
(a+ij + a+ji)I(gi(t) 6= gj(t)), gi(t) = I(i > t).
Expectation and variance under permutation null distribution:
E(X(t)) =2kt(n− t)n− 1
,
Var(X(t)) =t(n− t)n− 1
(h(t, n− t)
(q1,n + k − 2k2
n− 1
)+(1− h(t, n− t))
(q2,n + k − k2
)),
where h(t, n− t) =4t(n− t)
(n− 2)(n− 3), q1,n =
1
n
∑i,j
a+ija+ji, q2,n =
1
n
∑i 6=j;l
a+ila+jl.
Standardized count:
Z(t) = −X(t)− E(X(t))√Var(X(t))
Standardized count
In contrast: no change-point
Test statistic: maxn0≤t≤n−n0
Z(t)
In contrast: no change-point
Test statistic: maxn0≤t≤n−n0
Z(t)
Outline
1 Graph-based two-sample test
2 Offline change-point detection
3 Sequential (Online) change-point detection
4 An application
Online change-point detection based on NNs
N0 historical observations: y1, . . . ,yN0
subsequent observations: yN0+1,yN0+2, . . . ,yn, . . .
Zn(t): standardized count for the sequence y1, . . . ,yn.
maxn0≤t≤n−n0
Zn(t)
Online change-point detection based on NNs
N0 historical observations: y1, . . . ,yN0
subsequent observations: yN0+1,yN0+2, . . . ,yn, . . .
Zn(t): standardized count for the sequence y1, . . . ,yn.
maxn0≤t≤n−n0
Zn(t)
Stopping Time
T1 = inf
{n−N0 : max
n0≤t≤n−n0
Zn(t) > b1
}
T2 = inf
{n−N0 : max
n−n1≤t≤n−n0
Zn(t) > b2
}
T3 = inf
{n−N0 : max
n−n1≤t≤n−n0
ZnL(t) > b3
},
ZnL(t): standardized count for observations yn−L+1, . . . ,yn.
Stopping Time
T1 = inf
{n−N0 : max
n0≤t≤n−n0
Zn(t) > b1
}
T2 = inf
{n−N0 : max
n−n1≤t≤n−n0
Zn(t) > b2
}
T3 = inf
{n−N0 : max
n−n1≤t≤n−n0
ZnL(t) > b3
},
ZnL(t): standardized count for observations yn−L+1, . . . ,yn.
Stopping Time
T1 = inf
{n−N0 : max
n0≤t≤n−n0
Zn(t) > b1
}
T2 = inf
{n−N0 : max
n−n1≤t≤n−n0
Zn(t) > b2
}
T3 = inf
{n−N0 : max
n−n1≤t≤n−n0
ZnL(t) > b3
},
ZnL(t): standardized count for observations yn−L+1, . . . ,yn.
Detection Delay
Average run length: E∞(T ).Expected detection delay: Er(N − r|N > r).
Threshold b selected subject to P∞(T < 1, 000) = 0.05.
r −N0 = 200.
Detection Delay
Average run length: E∞(T ).Expected detection delay: Er(N − r|N > r).
Threshold b selected subject to P∞(T < 1, 000) = 0.05.
r −N0 = 200.
Early stops (False discovery)
Threshold b selected subject to P∞(T < 1, 000) = 0.05.
False discovery rate at 200 new observations after the startingof the test:
T1 T2 T3
1-NN 0.0178 0.0205 0.0107
3-NN 0.0148 0.0183 0.0103
Average run length
T = inf
{n : max
n−n1≤t≤n−n0
ZnL(t) > b
},
E∞(Tb) = 10, 000 ⇒ b =?
Average run length
T = inf
{n : max
n−n1≤t≤n−n0
ZnL(t) > b
},
E∞(Tb) = 10, 000 ⇒ b =?
Average run length
T = inf
{n : max
n−n1≤t≤n−n0
ZnL(t) > b
},
Theorem
Suppose L, b, n0, n1 →∞ in such a way that b = c√L, n0 = u0L and
n1 = u1L for some fixed 0 < c <∞, 0 < u0 < u1 < 1. When there is nochange point, T is asymptotically exponentially distributed with expectation
E∞(Tb) ∼√2π exp(b2/2)
c2 b∫ u1
u0h1(u)h2(u)ν
(c√
2h1(u))ν(c√
2h2(u))du,
where
h1(u) =[16u(1− u)(k + pk,∞) + 2(1− 2u)2(qk,∞ − k2 + k)
]/σ2(u),
h2(u) =[16u2(1− u)2(pk,∞ + qk,∞ + k2 + 2p
(k)k,∞ − 2q
(k)k,∞)
+ 4u(1− u)(2q(k)k,∞ − 3qk,∞ + k2 + k) + 2(qk,∞ − k2 + k)]/σ2(u),
σ(u) = 4u(1− u)(4u(1− u)(k + pk,∞) + (1− 2u)2(qk,∞ − k2 + k)).
Mutual NN and Shared NN
Mutual NN:
pk,∞ = limn→∞
E
1
n
∑j
a+n,ija+n,ji
, p(k)k,∞ = lim
n→∞E
1
n
∑j
a+n,ija(k)n,ji
Shared NN:
qk,∞ = limn→∞
E
1
n
∑j 6=l
a+n,jia+n,li
, q(k)k,∞ = lim
n→∞E
1
n
∑j 6=l
a+n,jia(k)n,li
For multivariate data and under Euclidean distance, pk,∞, qk,∞,
p(k)k,∞, q
(k)k,∞ can be expressed as analytic functions of the
dimension of the data.
In practice, it is better to use pk,L, qk,L, p(k)k,L, q
(k)k,Lestimated from
the data.
Mutual NN and Shared NN
Mutual NN:
pk,∞ = limn→∞
E
1
n
∑j
a+n,ija+n,ji
, p(k)k,∞ = lim
n→∞E
1
n
∑j
a+n,ija(k)n,ji
Shared NN:
qk,∞ = limn→∞
E
1
n
∑j 6=l
a+n,jia+n,li
, q(k)k,∞ = lim
n→∞E
1
n
∑j 6=l
a+n,jia(k)n,li
For multivariate data and under Euclidean distance, pk,∞, qk,∞,
p(k)k,∞, q
(k)k,∞ can be expressed as analytic functions of the
dimension of the data.
In practice, it is better to use pk,L, qk,L, p(k)k,L, q
(k)k,Lestimated from
the data.
How does the asymptotic result work for finite L?
L = 200.
Check the threshold b such that E∞(T ) = 10, 000.
Multivariate Gaussian data.
n0 = 3 n0 = 10Monte Asymp. Monte Asymp.Carlo1 Carlo
d = 10k = 1 4.04 4.40 4.04 4.31k = 3 4.14 4.34 4.14 4.23
d = 100k = 1 3.76 4.37 3.76 4.26k = 3 3.78 4.33 3.78 4.20
110,000 simulation runs.
Skewness correction
E∞(T3) ∼√2π exp(b2/2)
c2 b∫ u1
u0S(u)h1(u)h2(u)ν
(c√
2h1(u))ν(c√
2h2(u))du
S(u) depends on the probabilities of the following events:
Skewness correction
E∞(T3) ∼√2π exp(b2/2)
c2 b∫ u1
u0S(u)h1(u)h2(u)ν
(c√
2h1(u))ν(c√
2h2(u))du
S(u) depends on the probabilities of the following events:
Skewness Correction
Check the threshold b such that E∞(T ) = 10, 000.
n0 = 3 n0 = 10Monte Skewness Asymp. Monte Skewness Asymp.Carlo Corrected Carlo Corrected
d = 10
k = 1 4.04 4.07 4.40 4.04 4.07 4.31k = 3 4.14 4.14 4.34 4.14 4.14 4.23
d = 100
k = 1 3.76 3.79 4.37 3.76 3.79 4.26k = 3 3.78 3.79 4.33 3.78 3.79 4.20
Power assessment
Percentage of trials (out of 1,000) that the method successfullydetects the change-point.
“Successful detection”: Detect the change-point within 100observations after it occurs.
Normal data Lognormal datad = 10 d = 100 d = 10 d = 100
∆ = 0.7 ∆ = 1.8 ∆ = 1.5 ∆ = 2
1-NN 0.02 0.21 0.48 0.08
3-NN 0.07 0.55 0.87 0.48
5-NN 0.15 0.81 0.95 0.77
Hotelling’s T 2 0.69 0.63 0.34 0.02
∆: change in the mean parameter.
Outline
1 Graph-based two-sample test
2 Offline change-point detection
3 Sequential (Online) change-point detection
4 An application
Is there a change in phone call pattern?
Mobile phone datacollected by MIT medialab
87 students and faculty
7/20/2004 – 6/14/2005
Mt: adjacency matrix for day t, 1 for element [i, j] if subject icalled subject j on day t.
We consider two distances:
The number of different entries: ‖Mt1 −Mt2‖2F .
The number of different entries, normalized by thegeometric mean of the total edges in each day:‖Mt1−Mt2‖
2F
‖Mt1‖F ‖Mt2‖F.
Is there a change in phone call pattern?
Mobile phone datacollected by MIT medialab
87 students and faculty
7/20/2004 – 6/14/2005
Mt: adjacency matrix for day t, 1 for element [i, j] if subject icalled subject j on day t.
We consider two distances:
The number of different entries: ‖Mt1 −Mt2‖2F .
The number of different entries, normalized by thegeometric mean of the total edges in each day:‖Mt1−Mt2‖
2F
‖Mt1‖F ‖Mt2‖F.
Is there a change in phone call pattern?
Mobile phone datacollected by MIT medialab
87 students and faculty
7/20/2004 – 6/14/2005
Mt: adjacency matrix for day t, 1 for element [i, j] if subject icalled subject j on day t.
We consider two distances:
The number of different entries: ‖Mt1 −Mt2‖2F .
The number of different entries, normalized by thegeometric mean of the total edges in each day:‖Mt1−Mt2‖
2F
‖Mt1‖F ‖Mt2‖F.
Phone-call network
Stopping times and nearby academic events
Distance 1 Distance 2 Nearby academic event*n = 66: n = 60:
2004/9/23 2004/9/17 9/9: First day of class for Fall termn = 166: n = 140:2005/1/1 2004/12/6 12/18: last day of class for Fall termn = 198: n = 194:2005/2/2 2005/1/29 2/2: First day of class for Spring term
n = 252:— 2005/3/28 3/21: Spring vacation
* The dates of the academic events are from the 2015-2016 academiccalendar of MIT as the 2004-2005 academic calendar of MIT cannot befound online.
Summary
Sequential change-point detection based on nearestneighbors can be applied to multivariate data andnon-Euclidean data as long as a similarity measure on thesample space can be well defined.
The stopping time based on the recent observations isrecommended. Its asymptotic distribution is derived andshown to be quite accurate for finite scenarios afterskewness correction. This makes the method aneasy-off-the-shelf approach to real problems.
Thank You!
Summary
Sequential change-point detection based on nearestneighbors can be applied to multivariate data andnon-Euclidean data as long as a similarity measure on thesample space can be well defined.
The stopping time based on the recent observations isrecommended. Its asymptotic distribution is derived andshown to be quite accurate for finite scenarios afterskewness correction. This makes the method aneasy-off-the-shelf approach to real problems.
Thank You!