Upload
eunice-gordon
View
239
Download
2
Tags:
Embed Size (px)
Citation preview
Spatial DataWhat is special about Spatial Data?
Briggs Henan University 2012
1
What is needed for spatial analysis?
1. Location information—a map2. An attribute dataset: e.g
population, rainfall3. Links between the locations
and the attributes4. Spatial proximity information
– Knowledge about relative spatial location
– Topological information
Briggs Henan University 2012
2Topology --knowledge about relative spatial positioningTopography --the form of the land surface, in particular, its elevation
Berry’s geographic matrix
locationAttributes or variables
Variable 1 Variable 2 … Variable P
areal unit 1
areal unit 2.
.
.
areal unit n
locationAttributes or variables
Population Income … Variable P
areal unit 1
areal unit 2.
.
.
areal unit n
locationAttributes or Variables
Population Income … Variable P
Henan
Shanxi.
.
.
areal unit n
time
geographicassociations
geographicdistribution geographic
fact
Berry, B.J.L 1964 Approaches to regional analysis: A synthesis . Annals of the Association of American Geographers, 54, pp. 2-11
2010
1990
2000
3
Briggs Henan University 2012
Admin_Name Admin_TypeCode_GB GMI_ADMINArea_km2Area_mi2 Area_prcnt_CHArea_prcnt_AllPop2008 PopDenKM2_03Anhui Province 340000 ANH 139400 53800 1.44 1.44 61350000 463.5Beijing City 110000 BJN 16808 6490 0.17 0.17 22000000 1309Chongqing City CQG 82300 31800 0.85 0.85 31442300 379Fujian Province 350000 FUJ 121400 46900 1.26 1.26 36040000 289.2Fujian, ROC ROC PNG 182.66 70.51 0.00 91261Gansu Province 620000 GAN 454000 175300 4.71 4.70 26281200 57.7Guangdong Province 440000 GND 177900 68700 1.84 1.84 95440000 467Guangxi Province_AR 450000 GNG 236700 91400 2.45 2.45 48160000 207Guizhou Province 520000 GUI 176100 68000 1.82 1.82 37927300 222Hainan Province 460000 HAI 33920 13100 0.35 0.35 8540000 241Hebei Province 130000 HEB 187700 72500 1.94 1.94 69888200 363Heilongjiang Province 230000 HLN 460000 177600 4.77 4.76 38253900 83Henan Province 410000 HEN 167000 64500 1.73 1.73 94290000 582Hong Kong SAR HKG 1104 422 0.011 0.01 7003700 6380Hubei Province 420000 HUB 185900 71800 1.93 1.92 57110000 324Hunan Province 430000 HUN 211800 81800 2.19 2.19 63800000 316Inner MongoliaProvince_AR 150000 NMN 1183000 456800 12.28 12.24 24137300 20.2Jiangsu Province 320000 JNS 102600 39600 1.06 1.06 76773000 724Jiangxi Province 360000 JNG 166900 64400 1.73 1.73 44000000 257Jilin Province 220000 JIL 187400 72400 1.94 1.94 27340000 145
Briggs Henan University 2012
4
Types of Spatial Data
• Continuous (surface) data
• Polygon (lattice) data
• Point data
• Network data
Briggs Henan University 2012
5
Spatial data type 1: Continuous (Surface Data)
• Spatially continuous data– attributes exist everywhere
• There are an infinite number locations
– But, attributes are usually only measured at a few locations
• There is a sample of point measurements
• e.g. precipitation, elevation
– A surface is used to represent continuous data
Briggs Henan University 2012
6
Spatial data type 2: Polygon Data• polygons completely covering
the area*– Attributes exist and are
measured at each location– Area can be:
• irregular (e.g. US state or China province boundaries)
• regular (e.g. remote sensing images in raster format)
Briggs Henan University 2012
7*Polygons completely covering an area are called a lattice
Spatial data type 3: Point data
• Point pattern– The locations are the focus– In many cases, there is no attribute involved
Briggs Henan University 2012
8
Spatial data type 4: Network data• Attributes may measure
– the network itself (the roads)– Objects on the network (cars)
• We often treat network objects as point data, which can cause serious errors– Crimes occur at addresses on
networks, but we often treat them as points
Briggs Henan University 2012
9
See: Yamada and Thill Local Indicators of network-constrained clusters in spatial point patterns. Geographical Analysis 39 (3) 2007 p. 268-292
Which will we study?
Point data(point pattern analysis: clustering and dispersion)
Polygon data* (polygon analysis: spatial autocorrelation and spatial regression)
Continuous data*
(Surface analysis: interpolation, trend surface analysis and kriging)
Briggs Henan University 2012
10
1: Analyzing Point Patserns (clusterirg and dispersion)2: Analyzing Polygons (Spatial Autocorrelation and Spatial Regression models)3Surface analysis: nterpolation, trend surface analysis and kriging)
*in the fall semester
Converting from one type of data to another.
--very common in spatial analysis
Briggs Henan University 2012
11
Converting point to continuous data:interpolation
##
###
##
#
### #
## ##
###
# # ### # #
## ## # #
## # ##
## #
####
## #### #
# ###
# #
# ## # ##
##
#
#
#####
# # # #
#
##
###
# ### ## ##### ## ## ##
#
# ## ######## ### #
## # ##
## ##
#
##
## ## # ## ##
## ##
#
# ### # # ## ### ## ## ## # # ### ### # #
##
###
##
# ##
# # ## #
#
### # # ###
#
#####
##
# ######
##### # ####
## ## #
#
#### ## ## ##
## #
# ####
# ##
###
##
##
##
12
Briggs Henan University 2012
Interpolation• Finding attribute values at locations where
there is no data, using locations with known data values
• Usually based on– Value at known location– Distance from known location
• Methods used– Inverse distance weighting– Kriging
Briggs Henan University 2012
13
Simple linear interpolation
Unknown
Known
Converting point data to polygons using Thiessen polygons
#
#
###
##
##
## ### ##
###
# # ### # #
#
# ## # ### # ##
## #
####
## #### ## ###
##
# ## # ##
#
# #
#
#####
# # # #
#
#
###
## ### ## #
#### ## ## ## ##
## ######## ### #
# # # #### #
##
##
## ## # ## ##
## ##
## #
## # # ## ##
# ## ## ## # # ### ### # ####
##
### #
## # ## #
#
### # # ###
#
#####
## # ##
######
### # #####
# ## #
#
#### ##
## #### #
# ####
# #####
##
####
14
Briggs Henan University 2012
Thiessen or Proximity Polgons(also called Dirichlet or Voronoi Polygons)
• Polygons created from a point layer
• Each point has a polygon (and each polygon has one point)
• any location within the polygon is closer to the enclosed point than to any other point
• space is divided as ‘evenly’ as possible between the polygons
A
Thiessen or Proximity Polygons
15
Briggs Henan University 2012
How to create Thiessen Polygons
Briggs Henan University 2012
16
1. Connect point to its nearest (closest) neighbor
2. Draw perpendicular line at midpoint
3. Repeat for other points
4. Thiessen polygons
Converting polygon to point data using Centroids
• Centroid—the balancing point for a polygon• used to apply point pattern analysis to polygon data• More about this later
Briggs Henan University 2012
17
Using a polygon to represent a set of points: Convex Hull
• the smallest convex polygon able to contain a set of points– no concave angles pointing inward
• A rubber band wrapped around a set of points• “reverse” of the centroid• Convex hull often used to create the boundary
of a study area– a “buffer” zone often added – Used in point pattern analysis to solve the boundary
problem.• Called a “guard zone”
No!
Briggs Henan University 2012
18
Models for Spatial Data:Raster and Vector
two alternative methods for representing spatial data
Briggs Henan University 2012
19
0 1 2 3 4 5 6 7 8 90 R T1 R T2 H R3 R4 R R5 R6 R T T H7 R T T8 R9 R
Real World
Vector RepresentationRaster Representation
Concept of Vector and Raster
line
polygon
point
20
Briggs Henan University 2012
house
river
trees
Comparing Raster and Vector ModelsRaster Model• area is covered by grid with (usually) equal-size, square cells• attributes are recorded by giving each cell a single value based on the majority feature (attribute) in the cell, such as
land use type or soil type• Image data is a special case of raster data in which the “attribute” is a reflectance value from the geomagnetic
spectrum– cells in image data often called pixels (picture elements)
Vector ModelThe fundamental concept of vector GIS is that all geographic features in the real work can be represented either as:• points or dots (nodes): trees, poles, fire plugs, airports, cities• lines (arcs): streams, streets, sewers,• areas (polygons): land parcels, cities, counties, forest, rock type Because representation depends on shape, ArcGIS refers to files containing vector data as shapefiles
21
Briggs Henan University 2012
Raster model
Briggs Henan University 2012
22
corn
wheat
fruit
clov
er
fruit
0 1 2 3 4 5 6 7 8 90123456789
1 1 1 1 1 4 4 5 5 51 1 1 1 1 4 4 5 5 51 1 1 1 1 4 4 5 5 51 1 1 1 1 4 4 5 5 51 1 1 1 1 4 4 5 5 52 2 2 2 2 2 2 3 3 32 2 2 2 2 2 2 3 3 32 2 2 2 2 2 2 3 3 32 2 4 4 2 2 2 3 3 32 2 4 4 2 2 2 3 3 3
Land use (or soil type)
186
21
Each cell (pixel) has a value between 0 and 255 (8 bits)
Image
Vector Model• point (node): 0-dimensions
– single x,y coordinate pair– zero area– tree, oil well, location for label
• line (arc): 1-dimension– two connected x,y coordinates– road, stream– A network is simply 2 or more
connected lines
• polygon : 2-dimensions– four or more ordered and connected
x,y coordinates – first and last x,y pairs are the same– encloses an area– county, lake
1
2
7 8
.x=7
Point: 7,2y=2
Line: 7,2 8,1
Polygon: 7,2 8,1 7,1 7,2
1
2
7 8
1
2
1
1
2
7 8
23
Briggs Henan University 2012
Using raster and vector models to represent surfaces
Briggs Henan University 2012
24
Representing Surfaces with raster and vector models –3 ways• Contour lines
– Lines of equal surface value– Good for maps but not computers!
• Digital elevation model (raster)– raster cells record surface value
• TIN (vector)– Triangulated Irregular Network (TIN)– triangle vertices (corners) record surface
value
Briggs Henan University 2012
25
Contour (isolines) Lines for surface representation
Advantages• Easy to understand (for most people!)
– Circle = hill top (or basin)
– Downhill > = ridge– Uphill < = valley– Closer lines = steeper slope
Disadvantages• Not good for computer representation• Lines difficult to store in computer
Contour lines of constant elevation--also called isolines (iso = equal)
Raster for surface representation
Each cell in the raster records the height (elevation) of the surface
Briggs Henan University 2012
27
Raster cells(Contain elevation values)
Surface
105
110
115
120
Raster cells with elevation valueContour lines
• a set of non-overlapping triangles formed from irregularly spaced points
• preferably, points are located at “significant” locations, – bottom of valleys, tops of ridges
• Each corner of the triangle (vertex) has:– x, y horizontal coordinates
– z vertical coordinate measuring elevation.
Triangulated Irregular Network (TIN):Vector surface representation
Point # X Y Z1 10 30 1602 25 30 1503 30 25 1404 15 20 130
etc
valley
ridge
vertex
1 2
4 3
5
Draft: How to Create a TIN surface:
from points to surfaces
Briggs Henan University 2012
29
Thiessen3.jpg Thiessen4.jpg
Links together all spatial concepts: point, line, polygon, surface
Using raster and vector models to represent polygons(and points and lines)
Briggs Henan University 2012
30
Representing Polygons (and points and lines)
with raster and vector models
Briggs Henan University 2012
31
• Raster model not good– not accurate
• Also a big challenge for the vector model– but much more accurate– the solution to this challenge resulted in the
modern GIS system
0 1 2 3 4 5 6 7 8 90123456789
1 1 1 1 1 4 4 5 5 51 1 1 1 1 4 4 5 5 51 1 1 1 1 4 4 5 5 51 1 1 1 1 4 4 5 5 51 1 1 1 1 4 4 5 5 52 2 2 2 2 2 2 3 3 32 2 2 2 2 2 2 3 3 32 2 2 2 2 2 2 3 3 32 2 4 4 2 2 2 3 3 32 2 4 4 2 2 2 3 3 3X
Using Raster model for points, lines and polygons
--not good!
Briggs Henan University 2012
32
Polygon boundary not accurate
Line not accurate
Point located at cell center--even if its not
Point “lost” if two points in one cell
For points
For lines and polygons
Using vector model to represent points, lines and polygons:
Node/Arc/Polygon TopologyThe relationships between all spatial elements (points, lines, and polygons) defined by four concepts:
• Node-ARC relationship:– specifies which points (nodes) are connected to form arcs (lines)
• Arc-Arc relationship – specifies which arcs are connected to form networks
• Polygon-Arc relationship– defines polygons (areas) by specifying
which arcs form their boundary
• From-To relationship on all arcs – Every arc has a direction from a node to a node
– This allows
• This establishes left side and right side of an arc (e.g. street)
• Also polygon on the left and polygon on the right for
every side of the polygon LeftRight
from
to
33
Briggs Henan University 2012
from to
New
!
Node TableNode ID Easting Northing
1 126.5 578.12 218.6 581.93 224.2 470.44 129.1 471.9
Node Feature Attribute TableNode ID Control Crosswalk ADA?
1 light yes yes2 stop no no3 yield no no4 none yes no
Arc TableArc ID From N To N L Poly R PolyI 4 1 A34II 1 2 A34III 2 3 A35 A34IV 3 4 A34 Polygon Feature AttributeTable
Polygon ID Owner AddressA34 J. Smith 500 BirchA35 R. White 200 Main
Polygon TablePolygon ID Arc ListA34 I, II, III, IVA35 III, VI, VII, XI
Arc Feature Attribute TableArc ID Length Condition Lanes NameI 106 good 4II 92 poor 4 BirchIII 111 fair 2IV 95 fair 2 Cherry
Birch
Cherry
I
II
III
IV
1
4 3
Node/Arc/ Polygon and Attribute DataExample of computer implementation
Spatial DataAttribute Data
A35SmithEstateA34
2
34
Briggs Henan University 2012
This is how a vector GIS system works!
This data structure was invented by Scott Morehouse at the Harvard Laboratory for Computer Graphics in the 1960s.
Another graduate student named Jack Dangermond hired Scott Morehouse, moved to Redlands, CA, started a new
company called ESRI Inc., and created the first commercial GIS system, ArcInfo, in 1971
Modern GIS was born!
Briggs Henan University 2012
35
Other ways to represent polygons with vector model
2. Whole polygon structure
3. Points and Polygons structure
•Used in earlier GIS systems before node/arc/polygon system invented
•Still used today for some, more simple, spatial data (e.g. shapefiles)
•Discuss these if we have time!
Briggs Henan University 2012
36
Vector Data Structures: Whole Polygon
Whole Polygon (boundary structure): list coordinates of points in order as you ‘walk around’ the outside boundary of the polygon.– all data stored in one file – coordinates/borders for adjacent polygons stored twice;
• may not be same, resulting in slivers (gaps), or overlap
– all lines are ‘double’ (except for those on the outside periphery)– no topological information about polygons
• which are adjacent and have a common boundary?
– used by the first computer mapping program, SYMAP, in late 1960s– used by SAS/GRAPH and many later business mapping programs– Still used by shapefiles.
Topology --knowledge about relative spatial positioning -- knowledge about shared geometry
Topography --the form of the land surface, in particular, its elevation
37
Briggs Henan University 2012
Whole Polygon:illustration A 3 4
A 4 4
A 4 2
A 3 2
A 3 4
B 4 4
B 5 4
B 5 2
B 4 2
B 4 4
C 3 2
C 4 2
C 4 0
E A B
C D
1 2 3 4 5
0
1
2
3
4
5
C 3 0
C 3 2
D 4 2
D 5 2
D 5 0
D 4 0
D 4 2
E 1 5
E 5 5
E 5 4
E 3 4
E 3 0
E 1 0
E 1 5
Data File
38
Briggs Henan University 2012
Vector Data Structures: Points & Polygons
Points and Polygons: list ID numbers of points in order as you ‘walk around’ the outside boundary
• a second file lists all points and their coordinates.– solves the duplicate coordinate/double border problem
– still no topological information• Do not know which polygons have a common border
– first used by CALFORM, the second generation mapping package, from the Laboratory for Computer Graphics and Spatial Analysis at Harvard in early ‘70s
39
Briggs Henan University 2012
Points and Polygons:Illustration 1 3 4
2 4 4
3 4 2
4 3 2
5 5 4
6 5 2
7 5 0
8 4 0
9 3 0
10 1 0
11 1 5
12 5 5
E A B
C D
1 2 3 4 5
0
1
2
3
4
5 A 1, 2, 3, 4, 1
B 2, 5, 6, 3, 2
C 4, 3, 8, 9, 4
D 3, 6, 7, 8, 3
E 11, 12, 5, 1, 9, 10, 11
Points File
12
34
5
6
78910
1112
Polygons File
40
Briggs Henan University 2012
Hopefully, you now have a better understanding of
what is special about spatial data!
Monday, we will begin talking about Spatial Statistics
Briggs Henan University 2012
41
Briggs Henan University 2012
42