Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
A Stochastic Model for Seasonal Cycles in Crime
Yunbai Cao Kun Dong Beatrice Siercke Matt Wilber
August 9, 2013
(UCLA, Harvey Mudd) August 9, 2013 1 / 43
Introduction
Introduction
(UCLA, Harvey Mudd) August 9, 2013 2 / 43
Introduction
(UCLA, Harvey Mudd) August 9, 2013 3 / 43
Objective
We would like to create a mathematical model for theseasonal variation of burglary in Los Angeles, CA andHouston, TX, in order to forecast future crime trends.While seasonality of crime data is widely agreed upon,it is poorly understood, and we seek to determine if amodel can reproduce seasonality for a wide variety ofdata sets without relying on environmental factors.
Introduction
Criminology Background
(UCLA, Harvey Mudd) August 9, 2013 4 / 43
Inconsistent Seasonal Folklore
Disagreement over seasonality of crimes
Variation with crime type
Burglary peak in summer or winter
Geographical variation of behavior
Inconsistent hypotheses of what drives seasonality
Introduction
Data Sets
(UCLA, Harvey Mudd) August 9, 2013 5 / 43
Los Angeles Houston
Data sets contain property crime rates
LA data 2005-2013, Houston 2009-2013
Would like to divide into long-term trend, seasonal,and noise components Xt = Tt + St + Nt
Methods
Methods
(UCLA, Harvey Mudd) August 9, 2013 6 / 43
Methods
Singular Spectrum Analysis
(UCLA, Harvey Mudd) August 9, 2013 7 / 43
Decomposition of timeseries X = (x1, x2, . . . , xN)into elementaryreconstructed series
Window lengthL = N − K + 1Low-rank approximation oforiginal time series
Allows us to extractlong-term trend,seasonality, and noise
M =
x1 x2 x3 · · · xKx2 x3 x4 · · · xK+1x3 x4 x5 · · · xK+2...
......
. . ....
xL xL+1 xL+2 · · · xN
Methods
Singular Spectrum Analysis
(UCLA, Harvey Mudd) August 9, 2013 8 / 43
Interested in the first six or so singular values
Methods
Singular Spectrum Analysis
(UCLA, Harvey Mudd) August 9, 2013 9 / 43
First modecorresponds tolong-term trend
Modes 2 and 5display yearlyperiods
Methods
Trend Extraction
Los Angeles Houston
(UCLA, Harvey Mudd) August 9, 2013 10 / 43
Methods
Seasonality Extraction
Los Angeles Houston
(UCLA, Harvey Mudd) August 9, 2013 11 / 43
Methods
Seasonality Extraction
Los Angeles Houston
(UCLA, Harvey Mudd) August 9, 2013 12 / 43
Methods
Crime Type Comparisons
(UCLA, Harvey Mudd) August 9, 2013 13 / 43
Aggravated Assault,Murder, and Theft have aclear downward trend
Auto Theft and Robberydip then rise
Burglary and Rape brieflyincrease after two years,but otherwise decrease
Methods
Crime Type Comparisons
(UCLA, Harvey Mudd) August 9, 2013 14 / 43
Aggravated Assault, AutoTheft, Burglary, Rape, andTheft all have clear yearlyseasonal components
Methods
Crime Type Comparisons
(UCLA, Harvey Mudd) August 9, 2013 15 / 43
Murder has no seasonal component, only noise
Robbery has a semi-annual seasonal component
Methods
Crime Type Comparisons
In summary,
Extracted a clear trend for each crime type
Most types have a yearly seasonal component
Murder and Robbery do not have yearly components
(UCLA, Harvey Mudd) August 9, 2013 16 / 43
Methods
Stochastic Differential Equation (SDE)
(UCLA, Harvey Mudd) August 9, 2013 17 / 43
We will consider two-dimensional SDEs of the form
dXt = f (Xt ,Yt) dt + g(Xt ,Yt) dWt
dYt = f̃ (Xt ,Yt) dt + g̃(Xt ,Yt) dVt
White-noise terms dWt and dVt represent Brownianmotion
Methods
Stochastic Differential Equation (SDE)
(UCLA, Harvey Mudd) August 9, 2013 18 / 43
Itô Calculus uses a different chain rule. If we consideru(t, x , y), then
du =∂u
∂tdt + 〈∇u, dXt〉+
1
2〈Hess(u)dG , dG 〉
where
dXt =[dXtdYt
]and dG =
[g(Xt ,Yt) dWtg̃(Xt ,Yt) dVt
].
Methods
Lotka-Volterra
Follows the equations
ẋ = x(α− βy), x(0) = x0ẏ = −y(γ − δx), y(0) = y0
(UCLA, Harvey Mudd) August 9, 2013 19 / 43
Predator-PreyModel
Growth/decay rateparameters α, γ
Interaction effectparameters β, δ
Used incriminologyliterature
Natural periodicsolutions
Methods
Lotka-Volterra
Used a stochastic version of Lotka-Volterra,
dXt = Xt(α− βYt) dt + σ1Xt dWt , X (0) = X0dYt = −Yt(γ − δXt) dt + σ2Yt dVt , Y (0) = Y0.
White noise, IID terms dWt and dVt
Noise is proportional to size of population at time t
Xt = Available targets per criminal
Yt = Crime rate
(UCLA, Harvey Mudd) August 9, 2013 20 / 43
Methods
Energy of the Model
(UCLA, Harvey Mudd) August 9, 2013 21 / 43
Well-known energy of the Lotka-Volterra Model:
E = α log y + γ log x − βy − δx
Using Itô’s Lemma, we find
Et = E0 +
t∫0
σ1(δ − γXt) dWt + σ2(α− βYt) dVt
− 1
2(σ1
2γ + σ22α)t
Methods
Energy of the Model
(UCLA, Harvey Mudd) August 9, 2013 22 / 43
Expected Energy
Taking expectation,
E(Et) = E(E0)−1
2(σ1
2γ + σ22α)t.
Energy decreases over time
Orbit tends to leave equilibrium
Built-in period variation
Can be used to validate weak order of scheme
Methods
Numerics For Stochastic Lotka-Volterra
(UCLA, Harvey Mudd) August 9, 2013 23 / 43
Used semi-implicit scheme, derived by setting
Xt+1 ≈ Xt + (αXt ∆t − βXt+1Yt ∆t + σ1Xt ∆Wt)
Yt+1 ≈ Yt + (γXtYt ∆t − δYt+1 ∆t + σ2Yt ∆Vt)(1)
where ∆t is the timestep and ∆Wt ,∆Vt ∼√
∆t · N(0, 1).
Methods
Numerics For Stochastic Lotka-Volterra
Solving for Xt+1, Yt+1 results in thescheme
Xt+1 =1 + α∆t + σ1∆Wt
1 + βYt∆tXt
Yt+1 =1 + γXt∆t + σ2∆Vt
1 + δ∆tYt .
Weak first order, but quick to compute
Maintains positivity with very highprobability
(UCLA, Harvey Mudd) August 9, 2013 24 / 43
Results
Results
(UCLA, Harvey Mudd) August 9, 2013 25 / 43
Results
Simulations
(UCLA, Harvey Mudd) August 9, 2013 26 / 43
Plotσ1 = 0, σ2 = 0.02
Least squareparameterestimation
Minimum leastsquares differencefrom data
Missing period
Results
Simulations
(UCLA, Harvey Mudd) August 9, 2013 27 / 43
Los Angeles Houston
Reproduced missing period
Results
Simulations
(UCLA, Harvey Mudd) August 9, 2013 28 / 43
Los Angeles Aggravated Assault Data in red,simulation in blue
Better fit withoutmissed period orwidely varyingpeak heights
Shows that ourmodel worksacross a variety ofdata
Results
Simulations
(UCLA, Harvey Mudd) August 9, 2013 29 / 43
Houston Aggravated Assault
Data in red,simulation in blue
Criminologicalimplications
Results
LA Burglary Video
(UCLA, Harvey Mudd) August 9, 2013 30 / 43
laclip.mpgMedia File (video/mpeg)
Results
Houston Burglary Video
(UCLA, Harvey Mudd) August 9, 2013 31 / 43
HTclip.mpgMedia File (video/mpeg)
Results
Missing Period Statistics
Want to identifypeaks in time series
Use peakfinder.m,by Nathanael Yoder 1
Yoder, Nathanael (2009). PeakFinder(http://www.mathworks.com/matlabcentral/fileexchange/25500-peakfinder),MATLAB Central File Exchange. Retrieved July 2013.
(UCLA, Harvey Mudd) August 9, 2013 32 / 43
Results
Missing Period Statistics
(UCLA, Harvey Mudd) August 9, 2013 33 / 43
Peakfinder.m
Used for finding “local peaks or valleys ... in anoisy vector.”
Input: A real vector
Output: [pks, locs], two vectors with peak heightsand peak locations, respectively.
Options
sel: The “amount above surrounding data for apeak to be identified”thresh: A threshold above which maxima must beto be considered peaks
Results
Method A
(UCLA, Harvey Mudd) August 9, 2013 34 / 43
Peaks must be 10units above nearbydata
Peaks must be atleast 180 units high
Results
Method A
(UCLA, Harvey Mudd) August 9, 2013 35 / 43
If any pair of peaksare at leastminpeakdistance
days apart, a periodhas been missed
Results
Method Ω
(UCLA, Harvey Mudd) August 9, 2013 36 / 43
Peaks must be 7 unitsabove nearby data
Peaks must be atleast 179.6 units high
Peaks must be atleast 44 days apart
Peaks within 103days apart each otherare grouped into thesame peak cluster
Results
Method Ω
(UCLA, Harvey Mudd) August 9, 2013 37 / 43
If any pair of peaks are atleast minpeakdistance daysapart, a period has beenmissed
If the first peak appears afterminpeakdistance days, aperiod has been missed
If number of peak clusters isless than 8, a period has beenmissed
Results
Noise Parameter Analysis: Method A
Percentage relatively sensitive to noise
Varies little with σ1
Decreases with σ2
(UCLA, Harvey Mudd) August 9, 2013 38 / 43
Percent Missed Periods vs. Noise Parameters
σ1/σ2 0.010 0.015 0.020 0.025 0.030
0.000 56.9 36.3 21.3 11.0 5.50.01 56.8 37.6 20.2 11.2 5.00.02 59.8 35.7 19.4 10.6 5.00.03 55.0 34.6 18.7 8.2 4.0
Results
Noise Parameter Analysis: Method Ω
Varies little with σ1Decreases with σ2
(UCLA, Harvey Mudd) August 9, 2013 39 / 43
Percent Missed Periods vs. Noise Parameters
σ1/σ2 0.010 0.015 0.020 0.025 0.030
0.000 32.4 27.9 19.5 12.1 6.90.01 38.9 30.6 18.0 13.9 6.70.02 39.5 27.2 17.4 13.4 8.10.03 40.5 28.0 19.3 12.6 8.1
Results
Minpeakdistance Analysis: Method A
Both percentages decrease as minpeakdistance increases
Percentages relatively insensitive to minpeakdistance parameter
σ1 = 0, σ2 = 0.02
(UCLA, Harvey Mudd) August 9, 2013 40 / 43
Percent Missed Periods vs. Minpeakdistance
minpeakdistance 450 475 500 525 550
Percent Missed Peaks 22.5 21.7 19.2 16.0 17.0
Results
Minpeakdistance Analysis: Method Ω
Both percentages decrease as minpeakdistance increases
Percentages relatively sensitive to minpeakdistance parameter
σ1 = 0, σ2 = 0.02
(UCLA, Harvey Mudd) August 9, 2013 41 / 43
Percent Missed Periods vs. Minpeakdistance
minpeakdistance 450 460 470 480 490 500
Percent Missed Peaks 27.5 22.6 20.5 18.5 14.1 12.8
Results
Conclusions
Decomposed crime data into long-term trend, seasonal, and noisecomponents
Produced an SDE model to simulate burglary rates over several yearspans
Affirmed model’s ability to reproduce missed periods as in LA data
(UCLA, Harvey Mudd) August 9, 2013 42 / 43
Results
Acknowledgements
Jeffrey Brantingham, for the Los Angeles crime data
Houston Police Department, for the Houston crime data
Scott McCalla and James von Brecht for lots of advice and mentoring
(UCLA, Harvey Mudd) August 9, 2013 43 / 43