Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Lauren Bennett, Flora Vale, Alberto Nieto
Means & Medians to Machine Learning: Spatial Statistics Basics and Innovations
esriurl.com/spatialstats
What are Spatial Statistics?
Spatial Statistics are a set of exploratory techniques for
describing and modeling spatial distributions, patterns,
processes, and relationships.
proximityarea
lengthorientation
connectivitycoincidence
direction
Spreadsheets
Data or Information?
Maps
Data or Information?
When you look at a spreadsheet…
You ask for more
• Mean
• Standard Deviations
• Min and Max
• …
Same goes for maps!
We can do more
Means and Medians
Machine Learningclustering methods
summarizing spatial distributions
Means and Medianssummarizing spatial distributions
Central Feature
identifies the most centrally located feature in a point, line, or polygon feature class
Y
X
X
Y
X
Y
X
Y
X
Y
X
Y
central feature
Mean Center
identifies the geographic center (or the center of concentration) for a set of features
X
Y
(25,24)
(24,16)
(22,23)
(18,12)
(14,8)
(12,12)
(14,14)
(9,18)
(13,12)
X
Y (25,24)
(24,16)
(22,23)
(18,12)
(14,8)
(12,12)
(14,14)
(9,18)
(13,12)
(17,15)
mean center
mean =
(17,15)
Median Center
identifies the location that minimizes overall Euclidean distance to the features in a dataset
Y
X
(25,24)
(24,16)
(22,23)
(18,12)
(14,8)
(12,12)
(14,14)
(9,18)
(13,12)
X
YX: 25 . 24 . 22 . 18 . 14 . 14 . 13 . 12 . 9
Y: 24 . 23 . 18 . 16 . 14 . 12 . 12 . 12 . 8
(14,14)
median center
median =
Mean vs Median?
X
Y(176,138)
(25,24)
(24,16)
(22,23)
(18,12)
(14,8)
(12,12)
(14,14)
(9,18)
(13,12)
X
Y
Linear Directional Mean
identifies the mean direction, length, and geographic center for a set of lines
Y
X
Y
X
Directional Distribution(Standard Deviational Ellipse)
creates standard deviational ellipses to summarize the spatial characteristics of geographic features: central tendency, dispersion, and directional trends
X
Y
mean center
X
Y
Demo
Machine Learningclustering methods
Density-based Clustering
finds clusters based on feature locations
Cluster 1
Cluster 2
Cluster 3
Cluster 4
Cluster 5
Noise
DBSCAN – defined distance
HDBSCAN – self adjusting
OPTICS – multi-scale
DBSCAN – defined distance
DBSCAN – defined distance
DBSCAN – defined distance
DBSCAN – defined distance
DBSCAN – defined distance
DBSCAN – defined distance
DBSCAN – defined distance
HDBSCAN – self adjusting
OPTICS – multi-scale
OPTICS – multi-scale
OPTICS – multi-scale
OPTICS – multi-scaleR
each
abili
ty d
ista
nce
Feature order
OPTICS – multi-scale
OPTICS – multi-scale
DBSCAN HDBSCAN OPTICS
• Uses fixed search distance
• Clusters of similar densities
• Fast
• Uses range of search distances to find clusters of varying densities
• Data driven, requires least user input
• Uses neighbor distances to create reachability plot
• Most flexibility for fine tuning
• Can be computationally intensive
Demo
Multivariate Clustering
finds clusters based on feature attributes
3 groups 4 groupsK Means
2 groups 3 groups 4 groups
K Means
2 groups 3 groups 4 groups
K Means
2 groups 3 groups 4 groups
K Means
% Below 138 FPL
% Uninsured
% No High School
% Latino
Eligible Uninsured Americans
Spatially Constrained Multivariate Clustering
finds clusters based on feature attributes and proximity
https://xkcd.com/1364/
Minimum Spanning Tree
Minimum Spanning Tree
Minimum Spanning Tree
Minimum Spanning Tree
Minimum Spanning Tree
Minimum Spanning Tree
Minimum Spanning Tree
Crime in Chicago
Median Income
HS Dropout Rate
UnemploymentUnemployment
Crime Count
Demo
dissimilar between
dissimilar between
uniformgroups
Build Balanced Zones
creates spatially contiguous zones in your study area using a genetic growth algorithm based on criteria that you specify
Criteria
Zone building
• Attribute target
• Number of zones and attribute target
• Number of zones
Criteria
Zone selection
• Equal area
• Compactness
• Equal number of features
• Attribute to consider
fitness score
9.14
…
9.14
10.22
top 50% move on to next generation
top 50% move on to next generation
top 50% move on to next generation
and crossover to create new offspring
top 50% move on to next generation
and crossover to create new offspring
top 50% move on to next generation
and crossover to create new offspring
top 50% move on to next generation
then the next fittest 50% moves on and crosses over to create the next generation
ConvergenceFi
tne
ss S
core
Generation
Demo
Want to learn more???
esriurl.com/spatialstats
TUESDAY_________________________________________
1:45p Data Visualization for Spatial Analysis 146C
3:00p Machine Learning in ArcGIS 146C
4:15p From Means and Medians to Machine Learning: Spatial Statistics Basics and Innovations 146C
WEDNESDAY______________________________________
8:30a Machine Learning in ArcGIS 146C
11a Data Visualization for Spatial Analysis 146C
1:30p From Means and Medians to Machine Learning: Spatial Statistics Basics and Innovations 146C
2:45p Spatial Data Mining: Cluster Analysis and Space Time Analysis 146C
4:00p Beyond Where: Modeling Spatial Relationships and Making Predictions 146C
5:15p The Forest for the Trees: Making Predictions Using Forest-Based Classification and Regression 146C
Please fill out a course survey!!!
Want to learn more???
esriurl.com/spatialstats
TUESDAY_________________________________________
1:45p Data Visualization for Spatial Analysis 146C
3:00p Machine Learning in ArcGIS 146C
4:15p From Means and Medians to Machine Learning: Spatial Statistics Basics and Innovations 146C
WEDNESDAY______________________________________
8:30a Machine Learning in ArcGIS 146C
11a Data Visualization for Spatial Analysis 146C
1:30p From Means and Medians to Machine Learning: Spatial Statistics Basics and Innovations 146C
2:45p Spatial Data Mining: Cluster Analysis and Space Time Analysis 146C
4:00p Beyond Where: Modeling Spatial Relationships and Making Predictions 146C
5:15p The Forest for the Trees: Making Predictions Using Forest-Based Classification and Regression 146C
Please fill out a course survey!!!