78
Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture 8 : Spatial Statistics 1 Autocorrelation & GWR Pat Browne

Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

  • View
    243

  • Download
    4

Embed Size (px)

Citation preview

Page 1: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Spatial Databases

First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things.

Lecture 8 : Spatial Statistics 1

Autocorrelation & GWR

Pat Browne

Page 2: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Spatial autocorrelation

• Spatial autocorrelation is the degree of correlation between neighbouring values.Spatial autocorrelation is detected when the value of a variable in a location is correlated with values of the same variable in the neighbourhood (can be measured with Moran I).

• Moran’s I measures the average correlation between the value of a variable at one location and the value at nearby locations. The essential idea is to specify pairs of locations that influence each other along with the relative intensity of interaction. Moran’s I provides a global view of spatial autocorrelation correlation.

Page 3: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Moran’s I

• The range of the Moran's I statistic depends on the spatial weight matrix.

• When Moran's I is scaled by its bounds the statistic is restricted to the range ±1

• Moran’s I can serve as a tool for modeling spatial dependencies in many data mining techniques.

Page 4: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Same Mean and SD but different Moran’s I

Page 5: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Same Mean and SD but different Moran’s I

Page 6: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Spatial Autocorrelation: Moran’s I - example

Page 7: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Moran’s I - example

•Pixel value set in (b) and (c ) are same but their Moran Is are different.•Q? Which dataset between (b) and (c ) has higher spatial autocorrelation?

Figure 7.5, pp. 190

Page 8: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Neigbourhood relationship contiguity matrix

Page 9: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Spatial autocorrelationNegative

Dispersed

Spatial Independence

Spatial ClusteringPositive

Page 10: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Moran’s I

• Global Moran’s I

• What is the extent of clustering in the total area?

• Is this clustering significantly different from a random spatial distribution?

• Local Moran’s I

• Do local clusters (high-high or low-low) or local spatial outliers (high-low or low-high) exist?

• Are these local clusters and spatial outliers statistically significant?

Page 11: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Moran Scatter Plot

Briggs Henan University 2010 11

Low/High negative SA

High/High positive SA

Low/Low positive SA

Scatter Diagram between X and Lag-X, the “spatial lag” of X formed by averaging all the values of X for the neighboring polygons

Identifies which type of spatial autocorrelation exists.

High/Low negative SA

Page 12: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Spatial autocorrelation

• Spatial autocorrelation is determined both by similarities in position, and by similarities in attributes– Sampling interval– Self-similarity

• Auto = self • Correlation = degree of relatedness

correspondence

Page 13: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Moran’s I index

Page 14: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Statistical Spatial Data

• In this lecture we consider spatial data contains an attribute e.g. house prices, occurrences of disease, occurrences of accidents, crop yield, poverty patterns, crime rates, etc. Earlier parts of the course covered the representation of physical objects such as houses, counties, and roads. These objects were arranged by theme. Here we consider attributes of those objects e.g. the population of an ED.

Page 15: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Spatial Statistics

• Spatial statistics is the statistical study of spatial data that varies over discrete space e.g. crime rates broken down by neighbourhood. Spatial statistical models can be used for estimation, description, and prediction based on probability theory (not covered).

Page 16: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Standard statistical concepts: i.i.d

• A collection of two or more random variables {X1, X2, … , } is independent and identically distributed if the variables have the same probability distribution, and are independent.

Page 17: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Standard statistical concepts: Examples

• Example i.i.d: All other things being equal, a sequence of dice rolls is i.i.d.

• Example of non i.i.d: bird nesting patterns in wetlands, where the independent variables are distance from water, length of grass, depth of water and the dependent variable would be the presence of a nest site. A uniform distribution of these variables on a map would indicate an even distribution, however a more complex emerges where the variables are spatially dependent.

Page 18: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Standard statistical concepts: Correlation

• Correlation: A correlation is a single number that describes the degree of relationship between two normally distributed variables. The variables are not designated as dependent or independent. The value of a correlation coefficient can vary from minus one to plus one. A minus one indicates a perfect negative correlation, while a plus one indicates a perfect positive correlation. A correlation of zero means there is no relationship between the two variables. When there is a negative correlation between two variables, as the value of one variable increases, the value of the other variable decreases, and vice versa.

Page 19: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Standard statistical concepts: Variance and covariance

• A measure of variation equal to the mean of the squared deviations from the mean. The variance is a measure of the amount of variation within the values of that variable, taking account of all possible values and their probabilities or weightings.

• Covariance is measure of the variation between variables, say X and Y. The range of covariance values is unrestricted. However, if the X and Y variables are first standardized, then covariance is the same as correlation and the range of covariance (correlation) values is from –1 to +1.

Page 20: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Standard statistical concepts: Correlation

• Correlation is a measure of the degree of linear relationship between two variables, say X and Y. While in regression the emphasis is on predicting one variable from the other, in correlation the emphasis is on the degree to which a linear model may describe the relationship between two variables. In regression the interest is directional, one variable is predicted and the other is the predictor; in correlation the interest is non-directional, the relationship is the critical aspect. The correlation coefficient may take on any value between plus and minus one (-1 < r < 1).

Page 21: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Standard statistical concepts: Regression

• Regression: takes a numerical dataset and develops a mathematical formula that fits the data. The results can be used to predict future behaviour. Works well with continuous quantitative data like weight, speed or age. Not good for categorical data where order is not significant, like colour, name, gender, nest/no nest. Example: plotting snowfall against height above sea level.

Page 22: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Standard statistical concepts: Regression

Y = A + BX; The response variable is y, and x is the continuous explanatory variable. Parameter A is the intercept. Parameter B is the slope. The difference between each data point and the value predicted by the line (the model) us called a residual

Page 23: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Standard statistical concepts: Null

hypothesis • The null hypothesis, H0, represents a theory that has

been put forward, either because it is believed to be true, but has not been proved. For example, in a clinical trial of a new drug, the null hypothesis might be that the new drug is no better, on average, than the current drug H0: there is no difference between the two drugs on average.

• In general, the null hypothesis for spatial data is that either the features themselves or of the values associated with those features are randomly distributed (e.g. no spatial pattern or bias).

Page 24: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Relation of i.i.d., regression, and correlation with

spatial phenomena. • The first law of geography according to Waldo Tobler is

"Everything is related to everything else, but near things are more related than distant things." In statistical terms this is called autocorrelation where the traditional i.i.d. assumption is not valid for spatially dependent variables (e.g. temperature or crime rate) we need special techniques to handle this type of data (e.g. Moran’s I). These techniques usually involve including a weight matrix which contains location information. The non-i.i.d. nature of spatially dependent variables carries over into regression and correlation which require spatial weights

Page 25: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Relation of i.i.d., regression, and correlation with spatial database

• Spatial databases are used for spatial data mining, which includes statistical techniques and more specialised DM techniques such as association rules.. In this case the data mining algorithms need to have a spatial context. We must explicitly include location information where previously with the i.i.d. assumption it was not required Typical generic data mining activities such as clustering, regression, classification, association rules, all need a spatial context. Spatial DM is used in a broad range scientific disciplines, such as analysis of crime, modelling land prices, poverty mapping, epidemiology, air pollution and health, natural and environmental sciences, etc. The analyst must be aware the special techniques required for SDM.

Page 26: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Relation of i.i.d., regression, and correlation with spatial database

• Spatial databases are also used for pure statistical research (e.g. environmental studies). Those variables that are spatially dependent (e.g. the PH of the soil) need to be clearly identified and special techniques applied to take into account their spatial bias.

Page 27: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Unique features of spatial data Statistics

• General Statistics assumes the samples are independently generated, which is may not the case with spatial dependent data, where:– Like things tend to cluster together.– Change tends to be gradual over space.

Page 28: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Spatial Autocorrelation1.

• Autocorrelation: degree of correlation between neighbouring values.

• Spatial dependency: neighbouring values are similar (i.e. positive spatial autocorrelation).

• Moran’s I enable assessment of the degree to which values tend to be similar to neighbouring values. We can observe how autocorrelation varies with distance.

• The Moran scatter plot relates individual values to weighted averages of neighbouring values. The slope of a regression line fitted to the points in the scatter plot gives the global Moran’s I.

Page 29: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Spatial Autocorrelation: Moran’s I

• Moran’s I measures the average correlation between the value of a variable at one location and the value at nearby locations. The essential idea is to specify pairs of locations that influence each other along with the relative intensity of interaction. Moran’s I provides a global view of spatial autocorrelation correlation. We will look at details later

• The range of the Moran's I statistic depends on the spatial weight matrix.

• When Moran's I is scaled by its bounds the statistic is restricted to the range ±1

• Moran’s I can serve as a tool for modeling spatial dependencies in many data mining techniques.

Page 30: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Spatial Autocorrelation: Case Study

Nest locationsDistance to open waterVegetation durability

Water depth

Page 31: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Spatial AutocorrelationClassical Statistical Assumptions

(i.i.d) do not hold for spatially dependent data

Page 32: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Unique features of spatial data Statistics First Law of Geography

• First law of geography [Tobler]:– Everything is related to everything, but nearby

things are more related than distant things.– People with similar backgrounds tend to live

in the same area– Economies of nearby regions tend to be

similar– Changes in temperature occur gradually over

space (and time) (equator V poles).

Page 33: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Spatial Autocorrelation: Moran’s I - example

Page 34: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Moran’s I - example

•Pixel value set in (b) and (c ) are same but their Moran Is are different.•Q? Which dataset between (b) and (c ) has higher spatial autocorrelation?

Figure 7.5, pp. 190

Page 35: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Moran’s I - example

• Moran I statistic for map 1 is 0.55316092 • Moran I statistic for map 2 is -0.76724138

Page 36: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Moran’s I - example

Page 37: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Spatial Autocorrelation : Moran Scatterplot Map

00

00 zz

WZWZ

QQ3 = 3 = HLHLQQ3 = 3 = HLHLQQ2= 2= LLLLQQ2= 2= LLLL

QQ1= 1= HHHHQQ1= 1= HHHHQQ4 = 4 = LHLHQQ4 = 4 = LHLH

São Paulo

Old-aged population

Page 38: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Spatial Heterogeneity.• Spatial heterogeneity; Is there such a thing as an

average place with respect to some property (e.g. vegetation). is difficult to imagine any subset of the Earth’s surface being a representative sample of the whole. GWR (later) addresses the localness of spatial data.

Page 39: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Neigbourhood relationship contiguity matrix

Page 40: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Spatial autocorrelation

• Spatial autocorrelation is determined both by similarities in position, and by similarities in attributes– Sampling interval– Self-similarity

• Auto = self • Correlation = degree of relatedness

correspondence

Page 41: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Spatial autocorrelation

• In the following slide, each diagram contains 32 white cell and 32 blue cells = 64 cells.

• BB = Blue beside Blue • BW = Blue beside White• WW = White beside White.

Page 42: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Spatial autocorrelationNegative

Dispersed

Spatial Independence

Spatial ClusteringPositive

Page 43: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Spatial regression (SR)

• Spatial regression (SR) is a global spatial modeling technique in which spatial autocorrelation among the regression parameters are taken into account. SR is usually performed for spatial data obtained from spatial zones or areas. The basic aim in SR modeling is to establish the relationship between a dependent variable measured over a spatial zone and other attributes of the spatial zone, for a given study area, where the spatial zones are the subset of the study area. While SR is known to be a modeling method in spatial data analysis literature in spatial data-mining literature it is considered to be a classification technique

Page 44: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Spatial autocorrelationNegative

Dispersed

Spatial Independence

Spatial ClusteringPositive

The grids A and B represent two different spatial resolutions over the same area. Grid A contains 16 cells and Grid B contains 64 cells. The strength of spatial autocorrelation is often a function of scale or spatial resolution, as illustrated in above using black and white cells. High negative spatial autocorrelation is exhibited in A since each cell has a different colour from its neighbouring cells. In B each cell can be subdivided into four half-size cells, assuming the cell’s homogeneity. Then, the strength of spatial autocorrelation among the black and white cells increases, while maintainingthe same cell arrangement. his illustrates that spatial autocorrelation varies with the study scale The strength of spatial autocorrelation is a function of scale, increasing from 4-by-4 case to the 8-by-8 case.

Page 45: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Summary of spatial stats

• Moran’s I measures the average correlation between the value of a variable at one location and the value at nearby locations.

• Local Moran statistic measures spatial dependence on a local basis, allowing the researcher to see its variation over space, and by Geographically

• Geographically Weighted Regression allows the parameters of a regression analysis to vary spatially. GWR helps in detecting local variations in spatial behavior and understanding local details, which may be masked by global regression models. GWR, regression coefficients are computed for every spatial zone.

Page 46: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Moran’s I

• A contiguity matrix may represent a neighborhood relationship defined using adjacency or Euclidean distance. There are several definitions adjacency include a four-neighbourhood or an eight-neighborhood. Given a gridded spatial framework, a four-neighborhood assumes that a pair of locations influence each other if they share an edge (rook). An eight-neighborhood assumes that a pair of locations influence each other if they share either an edge or a vertex (queen).

Page 47: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Moran’s I

• Using a normalised weight matrix the values of I range from -1 to 1.

• Value = 1 : Perfect positive correlation

• Value = 0 : No autocorrelation

• Value = -1: Perfect negative correlation

• A Moran’s I may appear low (say 0.17) but is statistically significant pattern is clustered since index is above 0.

Page 48: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Moran’s I

• Global Moran’s I

• What is the extent of clustering in the total area?

• Is this clustering significantly different from a random spatial distribution?

• Local Moran’s I

• Do local clusters (high-high or low-low) or local spatial outliers (high-low or low-high) exist?

• Are these local clusters and spatial outliers statistically significant?

Page 49: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Moran’s I: A measure of spatial autocorrelation

• Given sampled over n locations. Moran I is defined as

Where

and W is a normalized contiguity matrix.

nxxx ,...1

t

t

zz

zWzI

x,...,xxxz n1

Fig. 7.5, pp. 190

Page 50: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Spatial autocorrelationNegative

Dispersed

Spatial Independence

Spatial ClusteringPositive

The grids A and B represent two different spatial resolutions over the same area. Grid A contains 16 cells and Grid B contains 64 cells. The strength of spatial autocorrelation is often a function of scale or spatial resolution, as illustrated in above using black and white cells. High negative spatial autocorrelation is exhibited in A since each cell has a different colour from its neighbouring cells. In B each cell can be subdivided into four half-size cells, assuming the cell’s homogeneity. Then, the strength of spatial autocorrelation among the black and white cells increases, while maintainingthe same cell arrangement. his illustrates that spatial autocorrelation varies with the study scale The strength of spatial autocorrelation is a function of scale, increasing from 4-by-4 case to the 8-by-8 case.

Page 51: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

How to decide the weight wij ?

1) Binary wij, also called absolute adjacency. Covers the general case answering the question is a value in a region similar or different to its neighbours.

wij = 1 if two geographic entities are adjacent; otherwise, wij = 0. Choice of adjacency definition queens(8) or rooks(4).

The weight indicates the spatial interaction between entities.

Page 52: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

How to decide the weight wij ?

2) The distance between geographic entities. Often the inverse distance is used, further objects get less weight, near object get more weight e.g. centre of epidemic.

wij = f(dist(i,j)), dist(i,j) is the distance between i and j.

3) The length of common boundary for area entities. Policing borders, smaller borders less weight.

wij = f(leng(i,j)), leng(i,j) is the length of common boundary between i and j.

The weight indicates the spatial interaction between entities.

Page 53: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

How to decide the weight wij ?1

The choice of weights should ultimately be driven by a rationale for including those areas as neighbors that have a spatial effect on a given location. This rationale can be derived from theory or be the result of using ESDA to experiment with different weights and connectivity orders. Since weights matrices are used to create spatial lags that average neighboring values, the choice of a weights matrix will determine which neighboring values will be averaged. For instance, since rook weights will usually have fewer neighbors than queen weights, on average, each neighboring observation has more influence.

Page 54: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

How to decide the weight wij ?1

The question of which weights to choose is more pertinent in the context of modeling than ESDA since modeling is based on substantive notions of spatial effects while ESDA prioritizes the rejection of spatial randomness. Therefore, if there are no substantive reasons to guide the choice of weights in ESDA, using a weights file with as few neighbors as possible (such as rook) makes sense. Especially with irregular areal units (as opposed to grids), the difference between rook and queen weights is often minimal. However, it is advisable to test how sensitive your results are to your weights specifications by comparing multiple weights matrices.

Page 55: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Spatial Outlier Detection

• Global outliers are observations which appear inconsistent with the remainder of that data set.

• Global outliers deviate so much from other observations that it may be possible that they were generated by a different mechanism.

• Spatial outliers are observations that appear inconsistent with their neighbours.

Page 56: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Spatial Outlier Detection

• Detecting spatial outliers has important applications in transportation, ecology, public safety, public health, climatology and location based services.

• Geographic objects have a spatial (location, shape, metric & topological properties) & non-spatial component (house owner, sensor id., soil type).

Page 57: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Spatial Outlier Detection

• Spatial neighbourhoods may be defined using spatial attributes & spatial relations.

• Comparisons between spatially referenced objects can be based on non-spatial attributes.

• A spatial outlier is a spatially referenced object whose non-spatial attribute values differ from those of other spatially referenced objects in its spatial neighbourhood.

Page 58: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Data for Outlier detection

In diagram on left G,P,S,Q show a big change in attribute for a small change in location. The right hand diagram shows a normal distribution (corresponds to attribute axis in left diagram)

Page 59: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Spatial Outlier Detection

• The upper left & lower right quadrants of figure 7.17 indicate a spatial association of dissimilar values; low values surrounded by high value neighbours (P & Q) and high values surrounded by low values (S).

Page 60: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Spatial Outlier Detection

• Moranoutlier is a point located in the upper left or lower right quadrant of a Moran scatter plot.

Page 61: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Spatial Outlier Detection

• Moranoutlier is a point located in the upper left or lower right quadrant of a Moran scatter plot.

00

00 zz

WZWZ

QQ3 = 3 = HLHLQQ3 = 3 = HLHLQQ2= 2= LLLLQQ2= 2= LLLL

QQ1= 1= HHHHQQ1= 1= HHHHQQ4 = 4 = LHLHQQ4 = 4 = LHLH

DbCb

values in a given location

Page 62: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Model Evaluation

• Consider the two-class classification problem ‘nest’ or ‘no-nest’. The four possible outcomes (or predictions) are shown on the next slide. The desired predictions are:– 1) where the model says the should be a nest and

there is an actual nest (True Positive)– 2) where the model says there is no nest and there is

no nest (True Negative)

• The other outcomes are not desirable and point to a flaw in the model.

Page 63: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Model Evaluation

Page 64: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Spatial Statistical Models

• A Point Process is a model for the spatial distribution of points in a point pattern. Examples: the position of trees in a forest, location of petrol stations in a city.

• Actual real world point patterns can be compared (using distance) with a randomly distributed point pattern random.

Page 65: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Case Study

Nest locations Distance to open water

Vegetation durabilityWater depth

Example showing different predictions: (a) the actual locations of nests; (b) pixels with actual nests; (c) locations predicted by one model; and (d) locations predicted by another model. Prediction (d) is spatially more accurate than (c).

Page 66: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Classical statistical assumptions do not hold for spatially dependent

data

Page 67: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Case Study

• The previous maps illustrate two important features of spatial data:

• Spatial Autocorrelation (not independent)

• Spatial data is not identically distributed.

• Two random variables are identically distributed if and only if they have the same probability distribution.

Page 68: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Geographically weighted regression (GWR)

• GWR is an effective technique for exploring spatial non-stationarity, which is characterized by changes in relationships across the study region leading to varying relations between dependent and independent variables. Hence there is a need for better understanding of the spatial processes has emerged local modeling techniques. GWR has been implemented in various disciplines such as the natural, environmental, social and earth sciences.

Page 69: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Spatial Regression1

• The assumption of i.i.d. underlying ordinary least squares regression rarely holds for spatial data. There are several techniques that handle the spatial case;– Moving window regression– Geographic Weighted Regression (GWR)

• We will look at GWR

Page 70: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Geographic Weighted Regression (GWR) 1

• The steps are;1. Go to a location2. Conduct regression using the raw data and

a geographic weighting scheme.3. Move to next location go back to stage 2

until all locations have been visited.

• The output is a set of regression coefficients (e.g. slope and intercept) at each location

Page 71: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Coords of observations, variables. distance from first observation, and geographic weights

pointpoint xx yy Var 1Var 1 Var 2Var 2 distdist Geo wGeo w

11 2525 4545 1212 66 00 11

22 2525 4444 3434 5252 11 0.9950.995

33 2121 4848 3232 4141 55 0.88250.8825

44 2727 5252 1212 2525 88 0.72610.7261

55 1616 3131 1111 2222 1616 0.2780.278

66 4242 3535 1414 99 2020 0.08890.0889

77 99 6565 5656 4343 2626 0.0340.034

88 2929 7676 7575 6767 3232 0.0060.006

99 6161 6666 4343 3232 4242 0.00020.0002

Page 72: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Location of points for previous table

Page 73: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Regression using previous table and locations, the geographic weighting pulls the line towards the points with larger weights

Page 74: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Summary of spatial stats

• Moran’s I measures the average correlation between the value of a variable at one location and the value at nearby locations.

• Local Moran statistic measures spatial dependence on a local basis, allowing the researcher to see its variation over space, and by Geographically

• Geographically Weighted Regression allows the parameters of a regression analysis to vary spatially. GWR helps in detecting local variations in spatial behavior and understanding local details, which may be masked by global regression models. GWR, regression coefficients are computed for every spatial zone.

Page 75: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

© Oxford University Press, 2010. All rights reserved. Lloyd: Spatial Data Analysis

Page 76: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

Two scatter plots and fitted lines for different aggregations of same value© Oxford University Press, 2010. All rights reserved. Lloyd: Spatial Data Analysis

Page 77: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture

References

Lloyd: Spatial Data AnalysisApplied Spatial Data Analysis with R

Bivand, Pebesma, Gómez-Rubio

http://www.spatial.cs.umn.edu/Book/http://www.manning.com/obe/

Page 78: Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture