44
Geographically weighted regression Danlin Yu Yehua Dennis Wei Dept. of Geog., UWM

Geographically weighted regression

  • Upload
    monte

  • View
    200

  • Download
    21

Embed Size (px)

DESCRIPTION

Geographically weighted regression. Danlin Yu Yehua Dennis Wei Dept. of Geog., UWM. Outline of the presentation. Spatial non-stationarity: an example GWR – some definitions 6 good reasons using GWR Calibration and tests of GWR An example: housing hedonic model in Milwaukee - PowerPoint PPT Presentation

Citation preview

Page 1: Geographically  weighted regression

Geographically weighted regression

Danlin Yu

Yehua Dennis Wei

Dept. of Geog., UWM

Page 2: Geographically  weighted regression

Outline of the presentation

1. Spatial non-stationarity: an example

2. GWR – some definitions

3. 6 good reasons using GWR

4. Calibration and tests of GWR

5. An example: housing hedonic model in Milwaukee

6. Further information

Page 3: Geographically  weighted regression

1. Stationary v.s non-stationary

yi= 0 + 1x1i

e3

e2

e1

e4

Stationary process

e3

e2

e1

e4

Non-stationary process

yi= i0 + i1x1i

Assumed More realistic

Page 4: Geographically  weighted regression

Simpson’s paradox

House density

Hou

se P

rice

Spatially aggregated data Spatially disaggregated data

House density

Page 5: Geographically  weighted regression

Stationary v.s. non-stationary

If non-stationarity is modeled by stationary models– Possible wrong conclusions might be

drawn– Residuals of the model might be highly

spatial autocorrelated

Page 6: Geographically  weighted regression

Why do relationships vary spatially?

Sampling variation– Nuisance variation, not real spatial non-

stationarity Relationships intrinsically different across

space– Real spatial non-stationarity

Model misspecification– Can significant local variations be removed?

Page 7: Geographically  weighted regression

2. Some definitions Spatial non-stationarity: the same

stimulus provokes a different response in different parts of the study region

Global models: statements about processes which are assumed to be stationary and as such are location independent

Page 8: Geographically  weighted regression

Some definitions Local models: spatial decompositions

of global models, the results of local models are location dependent – a characteristic we usually anticipate from geographic (spatial) data

Page 9: Geographically  weighted regression

Regression Regression establishes relationship among

a dependent variable and a set of independent variable(s)

A typical linear regression model looks like: yi=0 + 1x1i+ 2x2i+……+ nxni+i

With yi the dependent variable, xji (j from 1 to n) the set of independent variables, and i the residual, all at location i

Page 10: Geographically  weighted regression

Regression When applied to spatial data, as can

be seen, it assumes a stationary spatial process– The same stimulus provokes the same

response in all parts of the study region– Highly untenable for spatial process

Page 11: Geographically  weighted regression

Geographically weighted regression

Local statistical technique to analyze spatial variations in relationships

Spatial non-stationarity is assumed and will be tested

Based on the “First Law of Geography”: everything is related with everything else, but closer things are more related

Page 12: Geographically  weighted regression

GWR Addresses the non-stationarity directly

– Allows the relationships to vary over space, i.e., s do not need to be everywhere the same

– This is the essence of GWR, in the linear form:

– yi=i0 + i1x1i+ i2x2i+……+ inxni+i

– Instead of remaining the same everywhere, s now vary in terms of locations (i)

Page 13: Geographically  weighted regression

3. 6 good reasons why using GWR

1. GWR is part of a growing trend in GIS towards local analysis

• Local statistics are spatial disaggregations of global ones

• Local analysis intends to understand the spatial data in more detail

Page 14: Geographically  weighted regression

Global v.s. local statistics Global statistics

– Similarity across space– Single-valued statistics– Not mappable– GIS “unfriendly”– Search for regularities– aspatial

Local statistics– Difference across

space– Multi-valued statistics– Mappable– GIS “friendly”– Search for exceptions– spatial

Page 15: Geographically  weighted regression

6 good reasons why using GWR

2. Provides useful link to GIS• GISs are very useful for the storage,

manipulation and display of spatial data• Analytical functions are not fully developed• In some cases the link between GIS and

spatial analysis has been a step backwards• Better spatial analytical tools are called for to

take advantage of GIS’s functions

Page 16: Geographically  weighted regression

GWR and GIS An important catalyst for the better

integration of GIS and spatial analysis has been the development of local spatial statistical techniques

GWR is among the recently new developments of local spatial analytical techniques

Page 17: Geographically  weighted regression

6 good reasons why using GWR

3. GWR is widely applicable to almost any form of spatial data

• Spatial link between “health” and “wealth”

• Presence/absence of a disease• Determinants of house values• Regional development mechanisms• Remote sensing

Page 18: Geographically  weighted regression

6 good reasons why using GWR

4. GWR is truly a spatial technique• It uses geographic information as well

as attribute information• It employs a spatial weighting function

with the assumption that near places are more similar than distant ones (geography matters)

• The outputs are location specific hence mappable for further analysis

Page 19: Geographically  weighted regression

6 good reasons why using GWR

5. Residuals from GWR are generally much lower and usually much less spatially dependent

• GWR models give much better fits to data, EVEN accounting for added model complexity and number of parameters (decrease in degrees of freedom)

• GWR residuals are usually much less spatially dependent

Page 20: Geographically  weighted regression

GWR Residuals

-.76 - -.35-.34 - -.09-.08 - .09.10 - .26.27 - .56

OLS Residuals

-1.34 - -.53-.52 - -.19-.18 - .08.09 - .37.38 - .92

0 100 200 30050Kilometers

±Moran's I = 0.144 Moran's I = 0.372

Page 21: Geographically  weighted regression

6 good reasons why using GWR

6. GWR as a “spatial microscope”• Instead of determining an optimal

bandwidth (nearest neighbors), they can be input a priori

• A series of bandwidths can be selected and the resulting parameter surface examined at different levels of smoothing (adjusting amplifying factor in a microscope)

Page 22: Geographically  weighted regression

6 good reasons why using GWR

6. GWR as a “spatial microscope”• Different details will exhibit different

spatial varying patterns, which enables the researchers to be more flexible in discovering interesting spatial patterns, examining theories, and determining further steps

Page 23: Geographically  weighted regression

4. Calibration of GWR Local weighted least squares

– Weights are attached with locations– Based on the “First Law of Geography”:

everything is related with everything else, but closer things are more related than remote ones

Page 24: Geographically  weighted regression

Weighting schemes Determines weights

– Most schemes tend to be Gaussian or Gaussian-like reflecting the type of dependency found in most spatial processes

– It can be either Fixed or Adaptive– Both schemes based on Gaussian or

Gaussian-like functions are implemented in GWR3.0 and R

Page 25: Geographically  weighted regression

Fixed weighting scheme

Bandwidth

Weighting function

Page 26: Geographically  weighted regression

Problems of fixed schemes

Might produce large estimate variances where data are sparse, while mask subtle local variations where data are dense

In extreme condition, fixed schemes might not be able to calibrate in local areas where data are too sparse to satisfy the calibration requirements (observations must be more than parameters)

Page 27: Geographically  weighted regression

Adaptive weighting schemes

Bandwidth

Weighting function

Page 28: Geographically  weighted regression

Adaptive weighting schemes

Adaptive schemes adjust itself according to the density of data– Shorter bandwidths where data are dense

and longer where sparse– Finding nearest neighbors are one of the

often used approaches

Page 29: Geographically  weighted regression

Calibration Surprisingly, the results of GWR appear to

be relatively insensitive to the choice of weighting functions as long as it is a continuous distance-based function (Gaussian or Gaussian-like functions)

Whichever weighting function is used, however the result will be sensitive to the bandwidth(s)

Page 30: Geographically  weighted regression

Calibration An optimal bandwidth (or nearest

neighbors) satisfies either– Least cross-validation (CV) score

CV score: the difference between observed value and the GWR calibrated value using the bandwidth or nearest neighbors

– Least Akaike Information Criterion (AIC) An information criterion, considers the added

complexity of GWR models

Page 31: Geographically  weighted regression

Tests Are GWR really better than OLS

models?– An ANOVA table test (done in GWR 3.0,

R)– The Akaike Information Criterion (AIC)

Less the AIC, better the model Rule of thumbs: a decrease of AIC of 3 is

regarded as successful improvement

Page 32: Geographically  weighted regression

Tests Are the coefficients really varying

across space– F-tests based on the variance of

coefficients– Monte Carlo tests: random permutation of

the data

Page 33: Geographically  weighted regression

5. An example Housing hedonic model in Milwaukee

– Data: MPROP 2004 – 3430+ samples used

– Dependent variable: the assessed value (price)

– Independent variables: air conditioner, floor size, fire place, house age, number of bathrooms, soil and Impervious surface (remote sensing acquired)

Page 34: Geographically  weighted regression

The global model Estimate Std. Error t value Pr(>|t|) (Intercept) 18944.05 4112.79 4.61 4.25e-06 Floor Size 78.88 2.00 39.42 <2e-16 House Age -508.56 33.45 -15.20 <2e-16 Fireplace 14688.13 1609.53 9.13 <2e-16 Air Conditioner 13412.99 1296.51 10.35 <2e-16 Number of Bathrooms 19697.65 1725.64 11.42 <2e-16 Soil&Imp. Surface -27926.77 5179.42 -5.39 7.44e-08 Residual standard error: 35230 on 3430 degrees of freedom Multiple R-Squared: 0.6252, Adjusted R-squared: 0.6246 F-statistic: 953.7 on 6 and 3430 DF, p-value: < 2.2e-16 Akaike Information Criterion: 81731.63

Page 35: Geographically  weighted regression

The global model 62% of the dependent variable’s variation is

explained All determinants are statistically significant Floor size is the largest positive

determinant; house age is the largest negative determinant

Deteriorated environment condition (large portion of soil&impervious surface) has significant negative impact

Page 36: Geographically  weighted regression

GWR run: summary Number of nearest neighbors for

calibration: 176 (adaptive scheme) AIC: 76317.39 (global: 81731.63)

GWR performs better than global model

ANOVA Test Source SS DF MS F OLS Residuals 4257667878068.3 7.00 GWR Improvement 3544862425088.0 327.83 10813043388.63 GWR Residuals 712805558309.1 3102.17 229776586.89 47.06 GWR Akaike Information Criterion: 76317.39 (OLS: 81731.63)

Page 37: Geographically  weighted regression

GWR run: non-stationarity check

F statistic Numerator DF Denominator DF* Pr (> F)

Floor Size 2.51 325.76 1001.69 0.00 House Age 1.40 192.81 1001.69 0.00 Fireplace 1.46 80.62 1001.69 0.01 Air Conditioner 1.23 429.17 1001.69 0.00 Number of Bathrooms 2.49 262.39 1001.69 0.00 Soil&Imp. Surface 1.42 375.71 1001.69 0.00

Tests are based on variance of coefficients, all independent variables vary significantly over space

Page 38: Geographically  weighted regression

Soil & Imp. SfcHigh : 34357.96

Low : -220301.55

F

House AgeHigh : 929.44

Low : -1402.30

E

Fire PlaceHigh : 74706.97

Low : -6722.29

C

Air ConditionerHigh : 55860.63

Low : -7098.88

B

±

0 10 205

Kilometers

Floor SizeHigh : 119.49

Low : 17.63

A

Num. of BathrmHigh : 39931.12

Low : -2044.24

D

Page 39: Geographically  weighted regression

General conclusions Except for floor size, the established

relationship between house values and the predictors are not necessarily significant everywhere in the City

Same amount of change in these attributes (ceteris paribus) will bring larger amount of change in house values for houses locate near the Lake than those farther away

Page 40: Geographically  weighted regression

General conclusions In the northwest and central eastern

part of the City, house ages and house values hold opposite relationship as the global model suggests – This is where the original immigrants built

their house, and historical values weight more than house age’s negative impact on house values

Page 41: Geographically  weighted regression

6. Interested Groups GWR 3.0 software package can be obtained

from Professor Stewart Fotheringham [email protected]

GWR R codes are available from Danlin Yu directly ([email protected])

Any interested groups can contact either Professor Yehua Dennis Wei ([email protected]) or me for further info.

Page 42: Geographically  weighted regression

Interested Groups The book: Geographically Weighted

Regression: the analysis of spatially varying relationships is HIGHLY recommended for anyone who are interested in applying GWR in their own problems

Page 43: Geographically  weighted regression

Acknowledgement Parts of the contents in this workshop

are from CSISS 2004 summer workshop Geographically Weighted Regression & Associated Statistics

Specific thanks go to Professors Stewart Fotheringham, Chris Brunsdon, Roger Bivand and Martin Charlton

Page 44: Geographically  weighted regression

Thank you all

Questions and comments