Geography and inequality - Uppsala UniversitySpatial Analysis Toolbox in ArcGIS), relatively common, easy to compare over time and between areas. Placing humans in the middle – Minus:

Geography and inequality

John Östh

Uppsala University

Aims

• Focus on the blessings of integrating a economics with geography

• Point to some of the risks of not being aware of the role of geography

• Inspire to further studies

Presentation outline

• The nature of neighbourhoods and the measurement of contexts – Contextualization approaches

• predefined areas • Radii • K-nearest

– Other methods for retrieving geographical information • Proximity measures and spatial interpolation • Accessibility and Spatial interaction

– What about availability to data?

• EquiPop – Software – Modelling assumptions

• Segregation – integration

The nature of neighbourhoods and the measurement of contexts

• Neighbourhood and context often used interchangeably.

– When we are thinking about a neighbourhood we place humans in the middle (f.i. Perry, 1929)

– When we measure neighbourhood we (usually) refer to a single concept, representing a piece of land (See Lee, 1968; Galster 2001)


x x

x x

x x

x x

x x

x x

1.

2.

1.

2.

1.

2.


(as a result) contextual data can be – Self-contained and place-bound (fixed borders)

• Taxes, parking fees, i.e. often based on local regulations

• Data becomes unique to a location – difficult to compare between locations

• For the study of social processes, fixed areas are problematic (Sampson et al., 2002).

– Overlapping • Usually mobility based statistics

• Landscape of opportunities - local labour market, consumer and service areas, etc. (what if you assumed that workers only looked for jobs in their local area…)


• Not giving the spatial containers of measurements any thoughts can lead to serious bias

– Here are a few examples

Example #1

Example #2

Municipalities and counties Counties

Two studies on early retirement indicated that 1) no spatial variation existed(county level) And 2) spatial variation existed (municipality level).


• The latter example is related to the Modifiable Areal Unit Problem, MAUP (see for instance Openshaw, 1984; Wong, 2004; Andersson and Musterd, 2010)

– And also gerrymandering (hopefully less common)


• Contextualization approaches

– Three approaches available

• Contextualization using predetermined areas

• Contextualization using radii

• Contextualization using k-nearest neighbour

– Which to use depends on question at hand but ask your self: What does the neighbourhood look like to the studied population?


• Contextualization using predetermined areas – Non-overlapping

– Areas such as Wards, Tracts, Counties, Blocks, OA, SAMS…

– Usually hierarchical

– PLUS: easy to contextualize, very common, easy to communicate and map, hierarchical areas easy to use in multi-level modelling

– Minus: comparison over time and between areas difficult, MAUP, not placing humans in the middle. Created for other purposes. Boolean border problem



Population in Swedish SAMS 2008

Mean 1027,54

Median 716

Minimum 1

Maximum 20119

percentile 10 100

percentile 20 238

percentile 30 387

percentile 40 541

percentile 50 716

percentile 60 928

percentile 70 1196

percentile 80 1530

percentile 90 2077


• Contextualization using radii – Overlapping

– Area is determined by distance from chosen center

– Multiple distances can be used to generate overlapping hierarchies and annuluses

– PLUS: relatively easy to use for the construction of contexts (I usually use Spatial Analysis Toolbox in ArcGIS), relatively common, easy to compare over time and between areas. Placing humans in the middle

– Minus: sensitive to variations in population distributions (test using point density measures)

– Boolean border problem (can be evaded using distance decay – but increases computation-complexity)

n

d

d

n

i

i

a

1


Example of radii-based statistics

Distance Meaning

100m radii Home

200m radii Block

400m radii Greater block

800m radii Neighbourhood


• Contextualization using k-nearest neighbour – Overlapping

– Area is variable and determined by the k-nearest neighbours

– Multiple k-levels can be used to depict differently sized neigbourhoods

– PLUS: Placing humans in the middle. Not sensitive to variations in population distributions, less sensitive to MAUP and border effects than the other two techniques. Suitable for comparison between areas and over time. Suitable for analysis of human processes

– Minus: usually very computer demanding and difficult to set-up. Disregards distance (may be evaded using distance decay formulations)


Example of knn based statistics


– Proximity measures and spatial interpolation

– Accessibility and Spatial interaction

– In order to discuss the above a short detour to how data and map fit together is necessary

Two kinds of data – Raster and Vector

OBJECTS

Using functions such as near, buffer, join, union or similar – points, lines and polygons can be interacted/intersected

Maps consists of layers

Matching data using GIS

• Using X and Y coordinates of any observed set of incidences – the locations of incidence can be matched to: – Underlying topography

– Features in proximity to locations

– The relationship to surrounding incidences

– …

• Matching can be conducted using several techniques:

Joining data spatially

• Matching spatially - selection

– Merging

And through attributes

• Matching with key variables - selection

– Merging

Measuring distance

Proximity measures are more than:

Though they are often the basis for our analyses.

22 ))()(( jijiij yyxxd

Accessibility

• Potential accessibility (Hansen 1959) can be formulated as:

• And is commonly used as (unconstrained):

• Also here, the spatial composition matters

Kilometers (decay prameter = 0,06931)

Distance 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33

ai = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

ai = 11 11 11

12,4165833

12,2224009

),(* ij

ji

ji dfiesOpportunita

ji

ijji dDa )exp(

I have seen spatial interaction models for South America stating that Brazil is doing better than expected…

Integration of economics, geography and accessibility

Work with A Reggiani and G Galiazzo, CEUS, 2015

Accessibility and distance

Network distance to hospital – straight line distance to hospital

Interpolation techniques

Where’s the data?

Spatially coded data on European level

• Example on sources of geo coded-data

– Corine

– Open-street

– Population grid

– GADM

– Inspire-sites

– Surprise ;)

Corine

Here (?)

OpenStreetMap

Population grid around 2 million squares are populated

Population Grid of EU & EquiPop

Population grid 2011 population as share of Max(2006,2011). K=40 000 nearest neighbours

Obvious between country comparison problems

GADM project

Inspire EU

Data may also be drawn from imagery

…and used in statistics

The surprise

The surprise #2

EquiPop

EquiPop

• EquiPop is a software-program developed for the calculation of k-nearest neighbourhoods/contexts. The software is specifically designed to work with datasets that contain thousands or millions of observations and offers viable solutions to Knn questions also where large areas and complex geographies are involved.

EquiPop

EquiPop

• Difference between conventional K-nearest models and EquiPop is the spatial arrangement of data.

Conventional model EquiPop

1. Sort matrix on distance from i to j

2. Collect values from k-nearest neighbours

3. When k has been reached, move to i+1 and redo.

1. Rectify the data to fit a predefined grid

2. Spatial relations in gridded space are predictable

3. Collect values from k-nearest neighbours using rule.

4. When k has been reached, move to i+1 and redo.

a. b.2 1 3 2 4 5 1 2 2

5 1 1 2 0 1 3 4 5

1 0 0 5 2 2 4 2 1 5

3 3 5 5 3 1 2 0 0 4

0 1 1 2 4 0 3 0 2 4

4 4 3 5 4 2 4 1 1 3

2 1 1 3 0 1 5 0 5 2

1 3 1 5 4 1 2 1 1

4 2 2 2 3 4 1 2 2

EquiPop

Simple layout • Get things in using

file-commands

• Chose what to be included and k-levels

• Start, batch, load and unzip

EquiPop can be used to create super-local patterns

TFR

Segregation

Measuring segregation

• Classic measure of segregation (probably the most widely used) is the index of Dissimilarity: D= (Massey and Denton 1988 is a widely spread text using D)

• Sensitive to MAUP, etc. – consider: – Population B = 9, W = 36 – scenario a:

– 9 regions (cells), w are spread equally, b are located in upper-right corner. D=0,88889

– 3 regions (Colours), same distribution. D = 0.66667

– Now pause and think – what will be the effect if we compare cities or regions over time? – what about scale?

Spatial measures of isolation

If there was no sorting The lines would have been flat!

Spatial measures of entropy

Entropy measures can also be employed But if the number of groups are more than 2 or if the populations are not equal in size Comparison becomes difficult (pop-weighted Shannon index, etc. may be employed, but the outcome becomes less intuitive in my opinion)

Max = ln(2) ~ 0.693

Over-time changes in Sweden

,00

,05

,10

,15

,20

,25

,30

10 100 1000 10000 100000

Visible minorities, SI (increasing segregation and increasing numbers)

Synliga minoriteter 2010 Synliga minoriteter 2002 Synliga minoriteter 1994

,00

,05

,10

,15

,20

,25

10 100 1000 10000 100000

SI, poverty (EU definition)

Fattigdom 2010 Fattigdom 2002 Fattigdom 1994

0

0,05

0,1

0,15

0,2

0,25

0,3

0,35

10 100 1000 10000 100000

SI, lower education, among individuals 19-64 years of age

Lågutbildade 2010 Lågutbildade 2002 Lågutbildade 1994

0

0,05

0,1

0,15

0,2

0,25

0,3

10 100 1000 10000 100000

SI, labour market inactivity (among 19-64 years of age)

Inaktivitet 2010 Inaktivitet 2002 Inaktivitet 1994

Thanks

Documents

Geography and inequality - Uppsala UniversitySpatial Analysis Toolbox in ArcGIS), relatively common, easy to compare over time and between areas. Placing humans in the middle – Minus: