Spatial Analysis I - UBC...

Preview:

Citation preview

Spatial Analysis ISpatial data analysisSpatial analysis and inference

RoadmapOutline:

What is spatial analysis?Spatial JoinsStep 1: Analysis of attributesStep 2: Preparing for analyses: working with distanceStep 3: Spatial patterns analysisStep 4: Kernel density analysisSummary

Aspatial vs spatial analysis

Difference of

Aspatial analyses assume that where you take your sample shouldn’t matter.

Spatial analysisTurns raw data into useful information

by adding greater informative content and value

Reveals patterns, trends, and anomalies that might otherwise be missed

Provides a check on human intuitionby helping in situations where the eye might deceive

Spatial analysisSpatial analysis can be

inductive, to examine empirical evidence in the search for patterns that might support new theories or general principles, as with disease mapping (cancer maps)deductive, focusing on the testing of known theories or principles against data (Sky Train stations as centres of criminal activity);normative, using spatial analysis to develop or prescribe new or better designs (geodesign).

Spatial analysisA method of analysis is spatial if the results depend on the locations of the objects being analyzed

move the objects / study boundaries and the results changeresults are not invariant under relocation

Spatial analysis uses the locations of objects, and, most often, the attributes of those objects

Spatial analysis is the crux of GIS Attribute linkages

Spatial data

P,L,A,Px

Attribute data

NOIR

Getting organized: JoinsOne of the more powerful features of a GIS is the ability to join

attribute tables to spatial layers based on a common geographic location ID (such as the CTUID).

ArcMap also has many different forms of spatial joins.(3-D joins and another reason not to use unprojected data

[scroll down to ‘Getting the Best Result’])

Step 1: Analysis of attributesAttribute table joins

Scatterplots

Other types of plots?

Regression

Looking for outliersor other unusual patternsin the attribute data

Problem? Spatial heterogeneity

Know your data

Step 2a: Preparing for analysis: getting our distances correct

Pythagorean or straight-line metric Shortest distance on a sphere? (Which route?)

Distance along a route represented in a GIS (a polyline) is often calculated by summing the lengths of each segment of the polyline

Because there is a general tendency for polylines to short-cut corners, the length of a polyline tends to be shorter than the length of the object it represents.Length of a 3 dimensional line measured off its planimetric representation

will also be shorter than its true lengthUnless you are working at very long distances (e.g., continental), work only

with projected data (e.g., m)

Pythagoras’s Theorem and the straight-line distance between two points on a plane. What is the length of D?

The effects of the Earth’s curvature on the measurement of distance, and the choice of shortest paths Geodesics

The length of a path as traveled on the Earth’s surface (red line) may be substantially longer than the length of its horizontal projection as evaluated in a two-dimensional GIS

In the figure are shown three paths across part of Dorset in the UK. The green path is the straight route (‘as the crow flies’), the red path is the modern road system, and the gray path represents the route followed by the road in 1886

(Courtesy Michael De Smith)

The vertical profiles of all three routes, with elevation plotted against the distance

traveled horizontally in each case.

1 ft = 0.3048 m, 1 yd = 0.9144 m.

(Courtesy Michael De Smith)

Question: how to determine the true (3D) length of a line?This used to be a complex process, but you can now achieve this result in two easy steps.

You need to have a linear feature and a DEM or TIN.

Use the Interpolate Shape Tool (3D Analyst) to add the Z values to the line

Use the Add Z Information Tool (3D Analyst) to add fields to the linear feature’s attribute table.

Buffers (dilations) of constant width drawn around a point, a polyline, and a polygon

0 Buffering is a commonly applied distance-based analysis

Identifying areas of influence:Buffers

Buffers representing 1⁄2-mile exclusion zones around all schools in part of Los Angeles

Step 2a: Preparing for analysis: getting our neighbours correct

0 Many spatial techniques require informative data on spatial relationships (usually 1 to n values).

0 How to formally define the spatial relationships between points, polygons or grids on the surface of analysis?

0 We would like to quantify ‘nearness’ in some fashion.0 How do we want to quantify that nearness (distance,

adjacency)?0 Many approaches require a weight matrix. 0 Matrices function like maps that guide our analysis.

Weight MatricesWe can use different types of weight matrices to see if there are different types of spatial relationships.

Two broad types of matrices: Distance-based (obviously useful for point features, but also used for polygonal features; usually a cut-off distance is defined [e.g., distance between < 1000m]). Can also use a network to determine the distance.Contiguity-based (a key attribute of polygonal features—do they share a common edge?)ArcMap’s help file for generating spatial weights.

Weight MatricesDistance

Distance-based creates bands around the points (perhaps 1000m) (points or centroids in polygon) to ID neighbours.K-Nearest Neighbor counts or ‘marks’ the k closest neighbors (a relative distance measure, since in some areas the k points may be very close while in other areas the k points may be much further away) (think of k points or centroids of polygons; or the k adjacent polygons)

K=3

Weight MatricesContiguity-based weights

Rook--counts only edge adjacenciesQueen--counts edges and vertices For rasters, very easy to visualize.For polygons, the resulting size of the neighbourhoods can vary widely

Rook Queen

Weight MatricesAn example of a weight matrix for polygons where a vertex is not counted as adjacent (2 is not adjacent to 6)

Note that polygons are not considered adjacent to themselves.

1 – adjacent

0 – not adjacent

1

2

3 4

5

6

1 2 3 4 5 6

1 0 1 0 0 1 0

2 1 0 1 1 1 0

3 0 1 0 1 0 0

4 0 1 1 0 0 1

5 1 1 0 0 0 1

6 0 0 0 1 1 0

Weight Matrices

A weight matrix is often contained in a weight’s file. We can establish such files in ArcGIS and in programs like GeoDa.

For example, in GeoDa weights files include:.gal for contiguity-based weights.gwt for distance-based weights

In other programs weights might be defined in the GUI rather than through a separate file.

Step 3: Spatial patterns’ analysis

Identification of how objects cluster is often important in many different fields:

ArchaeologyCriminologyEcologyEpidemiology

Points patterns can be identified as clustered, dispersed, or randomKinds of processes responsible for point patterns are:

First-order processes involve points being located independently (rain drops)Second-order processes involve

interaction between points (acorns from oaks)

The K function is an example of a descriptive statistic of patternLooking at the distribution of spatial

objects without considering their attributes.

Point pattern of individual tree locations. A, B, and C identify the individual trees analyzed in the following graphs

Here the points represent trees, but they could represent crime incidents, locations of people with a disease, store locations, etc.

(Source: Getis A. and Franklin J. 1987. Second-order neighborhood analysis of mapped point patterns. Ecology 68(3): 473–477).

Point pattern analysis

Point pattern analysis: Ripley’s K Summarizes spatial autocorrelation (point feature clustering or feature dispersion) over a range of distances.

This is used when you want to see how changing spatial distances impact nearest neighbour counts. It can help identify an appropriate window size.

In many pattern analysis studies, the selection of an appropriate scale of analysis is required. For example, a distance threshold or distance band is often needed for the analysis (e.g., kernel density analysis). When exploring spatial patterns at multiple distances and spatial scales, patterns change, often reflecting the dominance of particular spatial processes at work. Ripley's K function illustrates how the spatial clustering or dispersion of feature centroids changes when the neighborhood size changes.A local measure but on that can look at all distances.

ESRI’s description of Ripleyy’s K

(Source: Getis A. and Franklin J. 1987. Second-order neighborhood analysis of mapped point patterns. Ecology 68(3): 473–477).

Clustered

Overdispersion

Ripley's K Function

A- area (e.g., bounding box)N - # of ptsd – distance (classes)

k(i, j) is the weight, which is 1when the distance between i and j

is less than or equal to d and 0 when the distance between i and j is greater than d.

With the L(d) transformation, the expected value is equal to distance

What doesoverdispersionmean?

Pine trees are represented by green dots and other tree species are represented by red dots. The function counts the number of neighboring pine trees found within a given distance from each individual pine tree (Xm).

The number of observed neighboring pine trees is then traditionally compared to the number of pine trees one would expect to find based on a completely spatially random point pattern.

If the number of pines found within a given distance of each individual pine is greater than that for a random distribution, the distribution is clustered. If the number is smaller, the distribution is dispersed.

Permutations (over and over again)Spatial Autocorrelation measures such as Ripley’s K or Moran’s I usually compare your data to a theoretical random data set (whether polygon or point) in order to get a p-value.

In order to determine if the existing spatial data is statistically dissimilar to the null hypothesis of ‘complete spatial randomness’ (CSR), we need to simulate / create a spatially random probability distribution (this can’t be done mathematically [e.g., looking up a value in a table] since each study is unique).

Monte Carlo Simulation produces several (e.g., 99) random simulations (or permutations) that the software then compares against your observed data.

This can be used to develop a pseudo p-value: the probability that an actual set of numbers was observed only by chance.

PermutationsEach observation is given a set of randomly generated coordinates (selected using a uniform random number generator, not a ‘normal’ or gaussian random distribution), which is used to relocate each observation in space. To generate a random reference distribution of Moran's I (or Ripley’s K), the statistic is computed each time with a different set arrangement of points for the number of permutations specified (e.g., 99).

You can then compare this reference distribution to your observed Moran's I value to determine where it falls in comparison. The upper and lower confidence bands are derived from the random permutations.

A uniform distribution for the role of a single dice:

A spatially random distribution(one of many simulations) Observed distribution

Our p-value answers the question - what is the probability that the observed distribution could have occurred by chance?

Statistical Significance

Permutations to compute confidence envelope.

Car thefts in Vancouverafter 8:00 pm

A clustered distribution

Step 4: Kernel density analysis

Kernel Density analysis calculates the density of features in a neighborhood around each features. It can be calculated for both point and line features.

While the inputs are either point or line features, the output is a raster since a field output is being created

Possible uses include calculating the density of houses, crime reports, or roads or utility lines and using that density in a regression analysis, for example.

You can use a ‘population field’ to weight some features more heavily than others, depending on their meaning, or to allow one point to represent several observations. Source

Still looking only at the spatial objects.

(A) A collection of point objects(B) A kernel function

A

The kernel’s shape depends on a distance parameter—increasing the value of the parameter results in a broader and lower kernel, and reducing it results in a narrower and sharper kernel. When each point is replaced by a kernel and the kernels are added, the result is a density surface whose smoothness depends on the value of the distance parameter.

B

Density estimation using two different distance parameters in the respective kernel functions.

(A) The surface shows the density of ozone-monitoring stations in California, using a kernel radius of 150 km

(B) Zoomed to an area of Southern California, a kernel radius of 16 km is too small for this dataset, as it leaves each kernel isolated from its neighbors

Car thefts in Vancouverafter 8:00 pm 50 m cell sizeand a 500 m neighbourhood

Step 5: Spatial patterns analyses+

Cluster and Outlier Analysisidentifies spatial clusters of features with high or low values,

as well as identifying spatial outliers(formally: Anselin’s Local Moran's I).

This is just one example of the different ways we can analyze spatial patterns. ESRI provides helpful

information about this and other methods on their Spatial Statistics Resources page.

We are now including both the spatial object and its attribute.

Spatial Patterns of Obesity and Associated Risk Factors in the Conterminous U.S

Source

Some notes on the lab.

What do these values represent?1.96 Z-score represents

0.05% of the curve (two-tailed)2.58 Z-score represents

0.01% of the curve

The Moran’s Index in and of itselfisn’t important—it is the z-scoreand the p-value that tell the tale.

WRT Lab 3 and the Moran I’s interpretation.

Test Statistic for Normal Frequency Distribution

0-1.96

2.5%

1.96

2.5% 1%

2.58

*technically –1/(n-1)

–1/(n-1)

Reject null at 5%Reject null Reject null at 1%

Null Hypothesis: no spatial autocorrelation*Moran’s I = 0

Alternative Hypothesis: spatial autocorrelation exists (and/or dispersion exists)*Moran’s I > 0 (clustering) or I < 0 (dispersion)

Reject Null Hypothesis if Z score is greater than or equal to 1.96 (less than or equal to -1.96)

Interpretation: less than a 5% chance that the spatial autocorrelation (dispersion) found is random, 95% confident that spatial auto correlation (dispersion) exits.

Summary0 These are just a few of the methods available to analyse

spatial data.0 You should explore ArcMap’s Spatial Analyst toolbox as

well as the Spatial Statistics toolbox, since within those toolboxes you can find many additional methods that might be of use in your projects.

Recommended