Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Overview of Statistical Analysis of Spatial DataGeog 210C
Introduction to Spatial Data Analysis
Chris Funk
Lecture 1
C. Funk Geog 210C Spring 20112
Outline
Course OverviewTypes of Spatial DataWhy Spatial Statistics?Problems in Spatial Data AnalysisPoints to Remember
C. Funk Geog 210C Spring 20113
Class Logistics
Prerequisites: Geog 210B, or equivalent applied statistics course & consent of instructorFour unit course,
weekly R-based lab assignments (60% of final grade)Final exam (20% of final grade)Class participation (10% of final grade)
Why do we careTypes of spatial dataWhy spatial statistics?Problems in Spatial Data AnalysisPoints to Remember
4
Climate Hazard Group at UCSB
Greg Husak, Joel Michaelsen, Diego Pedreros, Pete Peterson, Mike Marshal, Laura Harrison, Park Williams, Greg Ederer, Teresa Everett, Amy McNally, Frank Davenport
Generally Food SecureModerately Food InsecureHighly Food InsecureExtremely Food InsecureFamineNo Data
Near Term Outlook (March 2011)
Famine Early Warning Systems Network
Why do we care - II
6
Why do we care-III Decisions …Example drawn from Gary Eilert’s presentation to the House Foreign Affairs committee, Oct 2, 2009
How should we plan for El Niño in the Horn of Africa?Outcome: 140 million dollars in aid shipped ~6 months earlier than usual
Source: USGS EWX
Poor rains, March-May 2009 Poor rains, June-July 2009Plan for a bigger problem. Rains have already been poor in most of 2009
Why do we care-III Decisions …Understand that it’s not only El Niño; climate change is also present
Last 4 rainy seasons are worst ever.
Main season rainfall decreasing, while second season is increasing
Almost 20% drop in main season rainfall since 1980
C. Funk/USGS
Why do we care-III Decisions …
Average 2000-2010 MAMJ SPI
Ethiopian and Kenyan MAMJ SPI
-0.6
0
0.6
1950 1960 1970 1980 1990 2000 2010
Year
SPI s
moo
thed
with
runn
ning
10-
year
mea
n
Ethiopia MAMJ SPI Kenya MAMJ SPI
Semi-arid food insecure areas experiencing increased dryness
Outcome = Early and Effective Response
FEWSNET—reports below normal rains …. USAID and U.N. agencies are taking action to ensure that sufficient stocks are in place, with USAID recently committing an additional $70 million.
USAID Frontlines, Feb. 2010
Belousov–Zhabotinsky Reaction
Non-equilibrium thermodynamical reaction
C. Funk Geog 210C Spring 201011
A
B
C
Why Geostatistics?
We only have one planet, so let’s get to know her Statistics evolved in the 17th century as a tool for help guide nascent states
The combination of statistics, computers, and the modern firehose of data might be able to guide us through the 21st centuryIncreasing resource demand and shortages will require more sophisticated management
The spatialness of data is importantPlace mattersNearness mattersAccuracy mattersUncertainty mattersTheory matters 1
2
John Graunt’s book Natural and Political Observations Made upon the Bills of Mortality (1662) analyzed mortality rolls in early London to warn of bubonic plagueData are not passive!
13
Overview-Labs
LabsWk Lab
DateLabs
1 3/30 Descriptive Univariate Statistics
2 4/6 Statistical Sampling 3 4/13 Intensity Analysis of Spatial Point Patterns 4 4/20 Interaction Analysis of Spatial Point Patterns
Spatial Point Patterns & CSR5 4/27 Quantifying Spatial Association (SA) in Scattered Data
6 5/04 Nonlinear Least Squares for SA Model Fitting
7 5/11 Spatial Prediction via Simple Kriging 8 5/18 Geostatistics in ArcMap
Related zip archive with data 9 5/25 Principal Component Analysis
14
Overview-Lectures
LecturesWeek Lectures
1a1b
Overview of Statistical Analysis of Spatial DataUnivariate Sample Statistics
2a2b
Intensity Analysis of Spatial Point PatternsUnivariate Random Variables
3a3b
Interaction Analysis of Spatial Point PatternsDescriptive Statistics (Bivariate & Multivariate)
4a4b
Point Patterns & Complete Spatial Randomness – IPoint Patterns & Complete Spatial Randomness - II
5a5b
Empirical SemivariogramsElements of Spatial Stochastic Processes I
6a6b
Elements of Spatial Stochastic Processes IIModeling Semivariograms
7a7b
Spatial Interpolation ExampleSimple Kriging
8a8b
Not So Simple KrigingExplaining Covariance
9a9b
Principal Component Analysis-I Principal Component Analysis-II
10a10b
Conditional Simulation – I Conditional Simulation – II
15
Introduction & Objectives
Spatial data:Geo-referenced attribute measurements; each measurement is associated with a location (point) or an entity (region or object) in geographical (or other) space
Attribute measurement scale can be continuous or discrete, e.g., chemical concentration, soil types, disease occurrences Sample locations can have a regular or irregular spatial arrangement i.e., data locations on a raster (regular lattice) or scattered in space; domain informed by a measurement is called the sample unit or support, e.g., points, pixels, polygonsSpatial data often have an additional temporal component; dynamic attribute evolution in space and time, spatiotemporal supportObjectives of this handout
To provide a brief overview of types of spatial dataTo highlight the role of spatial statistics in analyzing data of each type
16
Stages in Spatial Data Analysis
Exploratory analysisExplore spatial data using cartographic (or other visual) representationsstatistical analysis for detecting possible sub-populations, outliers, trends, relationships with neighboring values or other spatial variables
Modeling or confirmatory analysisEstablish parametric or non-parametric model(s) characterizing attribute spatial distributionEstimate model parameters from data; evaluate their statistical signicance;predict attribute values at other locations and/or future time instants
NotesAny processing of spatial data, e.g., filtering or interpolation, affects any inference made from themBoundaries between above stages are not always clear cut
17
Attributes Varying Continuously in Space
CharacteristicsAlso known (unfortunately) as geostatistical data, e.g., temperature, rainfall, elevation, population densityMeasurements of nominal scale, e.g., land cover types, or interval/ratio scale, e.g., sea floor depthOften, sparse samples are available only at fixed set of locations
18
Area or Lattice Data
CharacteristicsAttributes take values only at fixed set of areas or zones, e.g., administrative districts, pixels of satellite images Typically, all possible locations have been sampled; no attribute values between sampling units (unless there are missing values)
C. Funk Geog 210C Spring 2010
19
Point Pattern Data
CharacteristicsSeries of point locations with recorded “events", e.g., locations of trees, disease or crime incidentsPoint locations correspond to all possible events (mapped point pattern), or to a subset (sampled point pattern)Attribute values also possible at same locations, e.g., tree diameter, magnitude of earthquakes (marked point pattern)
20
Spatial Interaction or Network Data
Characteristicsattributes relate to pairs of points or areas: flows from origins to destinations, e.g., patients “flow" from residences to hospitalsLess tangible flows, e.g., information, could be defined
Analysis objectivesModeling of flow patterns = finding relationships between observed flows and explanatory variables, e.g., number of trips from origins to destinations as function of incomeClassical analysis methods focus on patterns of aggregate interaction, rather than individuals themselves; more recent focus is placed on understanding individual preferences and choice modelingSpatial location/allocation problems, and more generally spatial optimization problems, typically involve network data
21
Univariate Statistics and Spatial Pattern?
Two 1D attribute profiles with the same histogram:
Shortcomings of univariate statisticsUnivariate statistics, e.g., average, variance, histogram, do not suffice to describe spatial pattern; the spatial arrangement of attribute values matters too.
Spatial auto-correlation an aspect of spatial patternAttribute values measured at “nearby" supports tend to be more “similar" than those measured at “distant" supports; Tobler's 1st law(?) of Geography
22
Role of Spatial Statistics in Spatial Data Analysis
Spatially continuous dataModel attribute spatial variation over study area from sampled point valuesPredict attribute values at non-sampled locations (accounting for covariates)
Area (lattice) dataDetect and model spatial patterns or trends in area values; no prediction at non-sampled locations, unless smoothing of existing values or imputation of missing values is requiredUse covariates or relationships with adjacent attribute values for inference, e.g., disease rates in light of socioeconomic variables
Point patternsDetect clustering or regularity, as opposed to complete randomness, of event locations in space and/or timeIf clustering is detected, investigate possible relations between clusters and nearby “sources" or pertinent covariates
23
Spatial Versus Non-Spatial Statistics
Classical statisticsSamples assumed realizations of independent and identically distributed random variables (iid)Most hypothesis testing procedures call for samples from iid random variablesProblems with inference and hypothesis testing in a spatial setting
Spatial statisticsMultivariate statistics in a spatial/temporal context: each observation is viewed as a realization from a different random variable, but such random variables are auto-correlated in space and/or timeEach sample is not an independent piece of information, because precisely it is redundant with other samples (due to the corresponding random variables being auto-correlated)Auto- and cross-correlation (in space and/or time) is explicitly accounted for to establish confidence intervals for hypothesis testing
One can always choose to analyze spatial data with non-spatial statistics; problems arise when confidence intervals need to be reported …
24
Software for Statistical Analysis of Spatial Data
GIS-basedESRI's Spatial Analyst, Geostatistical Analyst …opt for “close" or “loose" coupling with specialized external packages when specific functionalities are missing from a GIS
Statistical packagesExtremely versatile in modeling; recent improvements in visualizationR and SpaceStat/GeoDa most popular in Geography
Image processing packagesMature technology, lots of new developmentsIDL and Matlab most popular in Remote Sensing and Electrical Engineering
Access to source code written in a straight-forward programming language is critical for research development in an academic environment …
25
Some Issues Specific to Spatial Data Analysis
A first lookDifferences from times series analysis: 1. irregular sampling 2. lack of clear indexing; no notion of past-present-future 3. auto- and cross-correlation in multiple directions
Multi-source data associated with different spatial/temporal resolutionsData often reported as aggregates over arbitrarily dened zones/areas; statistics of aggregates are not the same as those of individuals:
1. Modifiable Area Unit Problem (MAUP)2. Ecological Fallacy or Inference Problem (EIP)
Edge/boundary effects: samples near the edges of a study region have fewer neighbors than samples in the interior; near-edge samples might bear the effects of different spatial processesSpatial process models typically distinguish between rst- and second-order effects, i.e., between environmental controls and interactions (distinction between the two not always clear-cut)
26
Modifiable Area-Unit Problem: Aggregation Effect
Two spatial variables and their univariate/bivariate statistics
Statistics and relationships between spatial attributes depend on aggregation extent
27
Modifiable Area-Unit Problem: Zonation Effect
Upscaling spatial variables using two different aggregation schemes
For a given aggregation extent, statistics and relationships between spatial attributes depend on which individual values are aggregated and how
28
Ecological Inference Problem I
Downscaling spatial variables
Statistics and relationships between spatial variables at a finer spatial resolution are different than those derived at the original coarse resolution
Averages do not apply to individuals, since they are not homogeneous
29
Ecological Inference Problem II
Under-determined inverse problem
Multiple combinations of fine spatial resolution attribute values can lead to the same aggregate values at a coarser resolution (equi-finality)
30
First- Versus Second-Order Effects
First-order effectsSpatial pattern explained by environmental (or extrinsic) factors, e.g., attribute value y(x) is high at location x due to another attribute value y’(x) at the same location x, or another attribute value y’(x’) at a nearby location x’Second-order effectsSpatial pattern explained by interaction (or intrinsic) factors, e.g., attribute value y(x) is low at location x due to another (same-attribute) value y(x’) at a nearby location x’, provided both locations x and x’ lie in the same “environment"
31
Recap I
Spatial dataSet of geo-referenced measurements with attribute values and coordinates (topology & context also important)data types:1. spatial point patterns <= events2. data continuously varying in space <= fields3. area or lattice data <= objects4. spatial interaction data <= flows
Spatial data analysis objectivesexploratory analysis: looking for patterns/relationshipsconfirmatory analysis: establishing spatial process models from spatial patterns + model parameter estimation
32
Recap II
Spatial statisticsStatistical framework for analysis and modeling of spatial data: accounts for spatial auto-correlation and scale effects; allows assessing uncertainty in spatial analysis resultsMultivariate statistics tailored to the analysis of spatial data
Issues to be aware ofAny spatial analysis result is tied to a particular observation scale, i.e., to the particular sample support(s); the Modifiable Area Unit Problem (MAUP) Spatial process models typically distinguish between:First-order effects or environmental controlsSecond-order effects or interactions (spatial auto-correlation)
this dichotomy does not apply to actual data, only to data generating models ….