Upload
christian-kent
View
222
Download
0
Tags:
Embed Size (px)
Citation preview
www.spatialanalysisonline.com
Chapter 5
Part A: Spatial data exploration
3rd edition www.spatialanalysisonline.com 2
Spatial data exploration
Spatial analysis and data models (Anselin, 2002)
Object Field
GIS vector raster
Spatial Data points, lines, polygons
surfaces
Location discrete continuous
Observations process realisation sample
Spatial Arrangement spatial weights distance function
Statistical Analysis lattice geostatistics
Prediction extrapolation interpolation
Models lag and error error
Asymptotics expanding domain infill
3rd edition www.spatialanalysisonline.com 3
Spatial data exploration
Sampling frameworks Pure random sampling Stratified random – by class/strata
(proportionate, disproportionate) Randomised within defined grids Uniform Uniform with randomised offsets Sampling and declustering
3rd edition www.spatialanalysisonline.com 4
Spatial data exploration
Sampling frameworks – point sampling
3rd edition www.spatialanalysisonline.com 5
Spatial data exploration
Sampling frameworks – within zonesSelection of 5 random points per zone
Grid generation - square grid within field boundaries
Grid generation (hexagonal) - selection of 1 point per cell, random
offset from centre
3rd edition www.spatialanalysisonline.com 6
Spatial data explorationA. 10% random sample from existing point set B. Stratified random selection, 30% of each
stratum
800 radio-activity monitoring sites in Germany. Random sample of 80 (red/large dots)
200 radio-activity monitoring sites in Germany. Random sample of 30 (red/large dots)<100 units of radiation and 30 (crosses)>=100 units of radiation
3rd edition www.spatialanalysisonline.com 7
Spatial data exploration
Random points on a network
3rd edition www.spatialanalysisonline.com 8
Spatial data exploration
EDA, ESDA and ESTDA EDA – basic aims (after NIST)
maximize insight into a data set uncover underlying structure extract important variables detect outliers and anomalies test underlying assumptions develop parsimonious models determine optimal factor settings
3rd edition www.spatialanalysisonline.com 9
Spatial data exploration
ESDA (see GeoDa and STARS) Extending EDA ideas to the spatial domain
(lattice/zone models) Brushing Linking Mapped histograms Outlier mapping Box plots Conditional choropleth plots Rate mapping
3rd edition www.spatialanalysisonline.com 10
Spatial data exploration
ESDA: Brushing & linking
3rd edition www.spatialanalysisonline.com 11
Spatial data exploration
ESDA: Histogram linkage
3rd edition www.spatialanalysisonline.com 12
Spatial data exploration
ESDA: Parallel coordinate plot & star plot
3rd edition www.spatialanalysisonline.com 13
Spatial data exploration
ESDA: Mapped box plots
3rd edition www.spatialanalysisonline.com 14
Spatial data exploration
ESDA: Conditional choropleth mapping
3rd edition www.spatialanalysisonline.com 15
Spatial data exploration
ESDA: Mapped point dataA. Variable point size
B. Variable colourC. Semivariogram pairs
D. Voronoi analysis
3rd edition www.spatialanalysisonline.com 16
Spatial data exploration
ESDA: Trend analysis (continuous spatial data)
3rd edition www.spatialanalysisonline.com 17
Spatial data exploration
ESDA: Cluster hunting – GAM/K (steps)
1. Read data for the population at risk2. Identify the MBR containing the data, identify starting circle radius, and degree of
overlap3. Generate a grid covering the MBR4. For each grid-intersection generate a circle of radius r 5. Retrieve two counts for the population at risk and the variable of interest 6. Apply some “significance” test procedure 7. Keep the result if significant 8. Repeat Steps 5 to 7 until all circles have been processed 9. Increase circle radius by dr and return to Step 3 else go to Step 10 10. Create a smoothed density surface of excess incidence for the significant circles11. Map this surface and inspect the results
3rd edition www.spatialanalysisonline.com 18
Spatial data explorationGrid-based statistics
Univariate analysis of attribute data (non-spatial metrics)
Cross-classification and cross-tab analyses Spatial pattern analysis for grid data
(including Landscape metrics)Patch metrics; Class-level metrics; Landscape-
level metrics Quadrat analysis Multi-grid regression analysis
3rd edition www.spatialanalysisonline.com 19
Spatial data exploration
Grid-based statistics Landscape metrics
Non-spatial• Proportional abundance; Richness; Evenness; Diversity
Spatial• Patch size distribution and density; Patch shape
complexity; Core Area; Isolation/Proximity; Contrast; Dispersion; Contagion and Interspersion; Subdivision; Connectivity
3rd edition www.spatialanalysisonline.com 20
Spatial data exploration
Point (event) based statistics Typically analysis of point-pair distances Points vs events Distance metrics: Euclidean, spherical, Lp or
network Weighted or unweighted events Events, NOT computed points (e.g. centroids) Classical statistical models vs Monte Carlo and
other computational methods
3rd edition www.spatialanalysisonline.com 21
Spatial data exploration
Point (event) based statistics Basic Nearest neighbour (NN) model
Input coordinates of all points Compute (symmetric) distances matrix D Sort the distances to identify the 1st, 2nd,...kth
nearest values Compute the mean of the observed 1st, 2nd, ...kth
nearest values Compare this mean with the expected mean under
Complete Spatial Randomness (CSR or Poisson) model
3rd edition www.spatialanalysisonline.com 22
Spatial data exploration
Point (event) based statistics – NN model
r+drr
Area = r2 Area = 2rdr
Width = dr
3rd edition www.spatialanalysisonline.com 23
Spatial data exploration
Point (event) based statistics – NN model Mean NN distance:
Variance:
NN Index (Ratio):
Z-transform:
m2
1
mn
rrz
e
eeo
/261358.0n/
where N(0,1), ~ /)(
2
m
4)4(
2
/o eR r r
3rd edition www.spatialanalysisonline.com 24
Spatial data exploration
Point (event) based statistics Issues
Are observations n discrete points? Sample size (esp. for kth order NN, k>1) Model requires density estimation, m Boundary definition problems (density and edge
effects) – affects all methods NN reflexivity of point sets Limited use of frequency distribution Validity of Poisson model vs alternative models
3rd edition www.spatialanalysisonline.com 25
Spatial data exploration
Frequency distribution of nearest neighbour distances, i.e. The frequency of NN distances in distance bands, say
0-1km, 1-2kms, etc The cumulative frequency distribution is usually denoted
G(d) = #(di < r)/n where di are the NN distances
and n is the number of
measurements, or F(d) = #(di < r)/m where m is the number of random
points used in sampling
3rd edition www.spatialanalysisonline.com 26
Spatial data exploration
Computing G(d) [computing F(d) is similar] Find all the NN distances Rank them and form the cumulative frequency
distribution Compare to expected cumulative frequency distribution:
Similar in concept to K-S test with quadrat model, but compute the critical values by simulation rather than table lookup
2
1)( rmerG
3rd edition www.spatialanalysisonline.com 27
Spatial data exploration
Point (event) based statistics – clustering (ESDA) Is the observed clustering due to natural background
variation in the population from which the events arise? Over what spatial scales does clustering occur? Are clusters a reflection of regional variations in
underlying variables? Are clusters associated with some feature of interest,
such as a refinery, waste disposal site or nuclear plant? Are clusters simply spatial or are they spatio-temporal?
3rd edition www.spatialanalysisonline.com 28
Spatial data exploration
Point (event) based statistics – clustering kth order NN analysis Cumulative distance frequency distribution, G(r) Ripley K (or L) function – single or dual pattern PCP Hot spot and cluster analysis methods
3rd edition www.spatialanalysisonline.com 29
Spatial data exploration
Point (event) based statistics – Ripley K or L
Construct a circle, radius d, around each point (event), i
Count the number of other events, labelled j, that fall inside this circle
Repeat these first two stages for all points i, and then sum the results
Increment d by a small fixed amount Repeat the computation, giving values of
K(d) for a set of distances, d Adjust to provide ‘normalised measure’ L: d
dKdL
)(
)(
3rd edition www.spatialanalysisonline.com 30
Spatial data exploration
Point (event) based statistics – Ripley KRipley K - Lung Cancer dataset
0.00
50.00
100.00
150.00
200.00
250.00
300.00
350.00
9.9 99.3 188.7 278.0 367.4 456.8 546.1 635.5 724.9 814.2 903.6 993.0
Distance
L(d) observed
L(d) min
L(d) max
3rd edition www.spatialanalysisonline.com 31
Spatial data exploration
Point (event) based statistics – comments CSR vs PCP vs other models Data: location, time, attributes, error, duplicates
Duplicates: deliberate rounding, data resolution, genuine duplicate locations, agreed surrogate locations, deliberate data modification
Multi-approach analysis is beneficial Methods: choice of methods and parameters Other factors: borders, areas, metrics, background variation,
temporal variation, non-spatial factors Rare events and small samples Process-pattern vs cause-effect ESDA in most instances
3rd edition www.spatialanalysisonline.com 32
Spatial data exploration
Hot spot and cluster analysis – questions where are the main (most intensive) clusters located? are clusters distinct or do they merge into one another? are clusters associated with some known background
variable? is there a common size to clusters or are they variable
in size? do clusters themselves cluster into higher order
groupings? if comparable data are mapped over time, do the
clusters remain stable or do they move and/or disappear?
3rd edition www.spatialanalysisonline.com 33
Spatial data exploration
Hot spot (and cool-spot) analysis Visual inspection of mapped patterns Scale issues Proximal and duplicate points Point representation (size) Background variation/controls (risk adjustment) Weighted or unweighted Hierarchical or non-hierarchical Kernel & K-means methods
3rd edition www.spatialanalysisonline.com 34
Spatial data exploration
Hot spot analysis – Hierarchical NN Cancer incidence data 1st and 2nd order clusters