Upload
azavea
View
700
Download
8
Tags:
Embed Size (px)
DESCRIPTION
Data Philly Meetup for 2/19/2013 on geospatial data science with crime data and applications of GeoTrellis to solve challenges related to large data sets.
Citation preview
Web/Mobile
Geospatial
UI/UX Design
High Performance Computing
R&D
B Corporation
• Projects w/ Social Value
• Summer of Maps
• Pro Bono Program
• Donate share of profits
Research-Driven
• 10% Research Program
• Academic Collaborations
• Open Source
Spatial Temporal Forecasting
with Philadelphia Crime Data
How Phila PD uses Maps
Customized Map Products
Weekly CompStat Meetings
Web Crime Analysis
Complainant
CAD
Verizon
911
911 Operator
Radio
Dispatcher
Police Officer
District
48 Desk
INCT
Daily download
& Geocoding Routines
Incident Report
Completed by Officer District X
District Y
District Z
Maps distributed
Through Intranet,
Printing, CompStat
INCT & PARS – main database sources
over 5,000 incidents daily, over 2 million annually
PARS
The Context
1,500,000 people
7,000 police
1,000 civilian employees
2,000,000 new incidents / year
3 crime analysts
What we did
• Weekly Compstat• Lots of maps• Automation of map creation• Web-based systems
… but what if we could…
Accelerate the cycle Proactively notify Automate the process
Prototype
ArcViewVB & MapObjects
MS SQL Server
Crime Incidents
Database
Shapefiles
and
GRIDs
Process Documentation
.ini
file
… but there was a problem …
…it was crap …
… sort of.
We needed ….
1. Better Statistics
2. Notification
3. Simplicity
Crime Analysis – What has happened?– Mapping (spatial / temporal densities)
– Trending
– Intelligence Dashboard
Early Warning – What is out of the ordinary?– Statistical & Threshold-based Hunches (data
mining)
– Alerting
Risk Forecasting – What is likely to happen next?– Near Repeat Pattern
– Load Forecasting
Crime Analysis– Mapping (spatial / temporal densities)
– Trending
– Intelligence Dashboard
Early Warning– Statistical & Threshold-based Hunches (data
mining)
– Alerting
Risk Forecasting– Near Repeat Pattern
– Load Forecasting
Crime Analysis
Intelligence Dashboard
Crime Analysis
Early Warning
Early Warning
• Geographic Early Warning System– A system to alert staff of an unusual situation in a
particular location– Ingests data sets to automatically “cook on” and only
involves staff when a statistically unusual situation is found
HunchLab
Database
Operational
Database Alerting System
Geostatistical Engine
Operational
DatabaseOperational
Databases
Early Warning
What is a Hunch?
• A proposed hypothesis, saved into the system, and continually tested for validity
• Incident Attribute Requirements– Location (x, y)– Time (timestamp)– Classification
• Hunch Attributes– Location (area)– Time (recent / historic periods)– Classification
• Analyses– Statistical Hunch– Threshold Hunch
Hunch Parameters: Location
• Address & Radius• Precinct/County/Country• Custom Drawn Area• Mass Hunch
Hunch Parameters: Time
• Statistical Hunch– Recent Past– Historic Past
Hunch Parameters: Classification
• Category• Time of Day• Narrative
Hunch Helper
Email Alert
Hunch Details
Risk Forecasting
Predictive Analytics?
• Prediction vs. Forecasting
Near Repeat Pattern Analysis
Contagious Crime?
• Near repeat pattern analysis • “If one burglary occurs, how does the risk change nearby?”
What Do We Mean By Near Repeat?
• Repeat victimization– Incident at the same location at a later time (likely
related)• Near repeat victimization
– Incident at a nearby location at a later time (likely related)
• Incident A (place, time) --> Incident B (place, time)
Near Repeat Pattern Analysis
• The goal:– Quantify short term risk due to near-repeat victimization
• “If one burglary occurs, how does the risk of burglary for the neighbors change?”
• What we know:– Incident A (place, time) --> Incident B (place, time)
• Distance between A and B• Timeframe between A and B
• What we need to know:– What distances/timeframes are not simply random?
Near Repeat Pattern Analysis
• The process– Observe the pattern in historic data– Simulate the pattern in randomized historic data– Compare the observed pattern to the simulated patterns– Apply the non-random pattern to new incidents
• An example– 180 days of burglaries in Division 6 of Philadelphia
Near Repeat Pattern Analysis
Near Repeat Pattern Analysis
Near Repeat Pattern Analysis
Near Repeat Pattern Analysis
Near Repeat Pattern Analysis
• How can you test your own data?– Near Repeat Calculator
• http://www.temple.edu/cj/misc/nr/
• Papers– Near-Repeat Patterns in Philadelphia Shootings (2008)
• One city block & two weeks after one shooting– 33% increase in likelihood of a second event
Jerry Ratcliffe
Temple University
Contagious Crime?
Workload Forecasting
Improving CompStat
• Workload forecasting• “Given the time of year, day of week, time of day and
general trend, what counts of crimes should I expect?”
What Do We Mean By Load Forecasting?
• Workload forecasting• Generating aggregate crime counts for a future timeframe
using cyclical time series analysis
Measure cyclical patterns
Identify non-cyclical trend
Forecast expected count
+
bit.ly/gorrcrimeforecastingpaper
Load Forecasting
• Measure cyclical patterns• Take historic incidents (for example: last five years)• Generate multiplicative seasonal indices
– For each time cycle:» time of year» day of week» time of day
– Count incidents within each time unit (for example: Monday)– Calculate average per time unit if incidents were evenly
distributed– Divide counts within each time unit by the calculated average
to generate multiplicative indices» Index ~ 1 means at the average» Index > 1 means above average» Index < 1 means below average
Load Forecasting
Load Forecasting
Load Forecasting
Load Forecasting
Load Forecasting
• Identify non-cyclical trend• Take recent daily counts (for example: last year daily
counts)• Remove cyclical trends by dividing by indices
• Run a trending function on the new counts– Simple average
» Last X Days– Smoothing function
» Exponential smoothing» Holt’s linear exponential smoothing
Load Forecasting
• Forecast expected count• Project trend into future timeframe
– Always flat» Simple average» Exponential smoothing
– Linear trend» Holt’s linear exponential smoothing
• Multiple by seasonal indices to reseasonalize the data
Load Forecasting
Measure cyclical patterns
Identify non-cyclical trend
Forecast expected count
+
bit.ly/gorrcrimeforecastingpaper
Improving CompStat
How Do We Know It’s Accurate?
• Testing• Generated forecasting techniques(examples)
– Commonly Used» Average of last 30 days» Average of last 365 days» Last year’s count for the same time period
– Advanced Combinations» Different cyclical indices (example: day of year vs. month of year)» Different levels of geographic aggregation for indices» Different trending functions
• Scoring methodologies (examples)– Mean absolute percent error (with some enhancements)– Mean percent error– Mean squared error
• Run thousands of forecasts through testing framework• Choose the right technique in the right situation
Ongoing Research
Research Topics
• Risk Forecasting– Load forecasting enhancements
• Weather and special events
– Combining short and long term risk forecasts (Temple)• Socioeconomic changes in neighborhoods
– Risk Terrain Modeling (Rutgers)• Context of crime at the microplace
Research Topics
Research Topics
• Risk Forecasting– Offender Management
• Prioritize offenders based upon statistical models using past behaviors
• Evaluation– Automate Randomized Controlled Trials
Data Processing for Big (Geo) Data
A Story
Close to Center City
Walk to Grocery Store
Nearby Restaurants
Library
Near a Park
Biking / walking distance from our work
Biking distance to fencing
somewhat important
vital
very important
nice to have
somewhat important
very important
somewhat important
Robert’s Rules of Housing
Child Care
Local School Rankings
Farmer's Market
Car Share
Public Transit
Your factors might include…
We stand on the shoulders of giants
Not a new idea … Design with Nature
Not a new Idea … Dana Tomlin
Desktop GIS
x 5 x 2x 3x 1
+ ++
=
Weighted Overlay
Geography-driven Decisions
Iterative
Individual
Web [and Mobile]
Growing data sets
Summary
Web Challenges
Web is different from the Desktop
Lots of simultaneous users
Stateless environment
HTML+JS+CSS
Users are less skilled
Users are less patient
But wait … there’s a problem
10 – 60 second calculation time
Multiple simultaneous users …
… that are impatient
Data Challenges
Big Data – Social Media
Big Data – Science
Big Data – Citizen Science
Big Data – Cities
Early Prototype
Specific Optimization Goals New Raster File Structure
Distributed processing
Binary messaging protocol
Optimization: File Format Limit data type and range
1D arrays are fast to read/write
Tiled
Pyramids
Azavea Raster Grid (ARG)
Optimization: Distributed Processing Parallelizable - Local Ops and Focal Ops
Support multiple– Threads– Cores– CPU’s– Machines
Considered– Hadoop– Amazon Map Reduce– Beowolf
Success!!
Reduced from 10-60 seconds to
<500 milliseconds
Optimizing one process sub-optimizes others Complex to configure and maintain Limited to one operation No interpolation No mixing
– cell sizes– extents– projections
etc.
Broader set of functionality
Both raster and vector
Scala + Akka
Open source
Faster is Different
Regional/State: 84 ms
National: 84 ms
Large Country 115 ms
Continental 271 ms
Planet 1.2 – 2.0 s
Ongoing R&D
GPUs
Re-wrote a few Map Algebra operations: Local Neighborhood Zonal Viewshed etc.
15 – 120x Large grids Large kernels
GPU Results
Vector
Neighborhood/Focal
Spatial Statistics
Integration
New Spatial Operations
Urban Forest Ecosystem Modeling
Crime Analysis, Early Warning and Forecasting
GDAL
GeoServer
PostGIS
R
GeoDa
Open Source Geoprocessing
Many Thanks!© Photo used with permission from Alphafish, via Flickr.com