32
Overview of Statistical Analysis of Spatial Data Geog 210C Introduction to Spatial Data Analysis Chris Funk Lecture 1

Overview of Statistical Analysis of Spatial Data Geog 210Cchris/Lecture1_210C_Spring2011... · Overview of Statistical Analysis of Spatial Data Geog 210C Introduction to Spatial Data

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Overview of Statistical Analysis of Spatial Data Geog 210Cchris/Lecture1_210C_Spring2011... · Overview of Statistical Analysis of Spatial Data Geog 210C Introduction to Spatial Data

Overview of Statistical Analysis of Spatial DataGeog 210C

Introduction to Spatial Data Analysis

Chris Funk

Lecture 1

Page 2: Overview of Statistical Analysis of Spatial Data Geog 210Cchris/Lecture1_210C_Spring2011... · Overview of Statistical Analysis of Spatial Data Geog 210C Introduction to Spatial Data

C. Funk Geog 210C Spring 20112

Outline

Course OverviewTypes of Spatial DataWhy Spatial Statistics?Problems in Spatial Data AnalysisPoints to Remember

Page 3: Overview of Statistical Analysis of Spatial Data Geog 210Cchris/Lecture1_210C_Spring2011... · Overview of Statistical Analysis of Spatial Data Geog 210C Introduction to Spatial Data

C. Funk Geog 210C Spring 20113

Class Logistics

Prerequisites: Geog 210B, or equivalent applied statistics course & consent of instructorFour unit course,

weekly R-based lab assignments (60% of final grade)Final exam (20% of final grade)Class participation (10% of final grade)

Why do we careTypes of spatial dataWhy spatial statistics?Problems in Spatial Data AnalysisPoints to Remember

Page 4: Overview of Statistical Analysis of Spatial Data Geog 210Cchris/Lecture1_210C_Spring2011... · Overview of Statistical Analysis of Spatial Data Geog 210C Introduction to Spatial Data

4

Climate Hazard Group at UCSB

Greg Husak, Joel Michaelsen, Diego Pedreros, Pete Peterson, Mike Marshal, Laura Harrison, Park Williams, Greg Ederer, Teresa Everett, Amy McNally, Frank Davenport

Generally Food SecureModerately Food InsecureHighly Food InsecureExtremely Food InsecureFamineNo Data

Near Term Outlook (March 2011)

Famine Early Warning Systems Network

Page 7: Overview of Statistical Analysis of Spatial Data Geog 210Cchris/Lecture1_210C_Spring2011... · Overview of Statistical Analysis of Spatial Data Geog 210C Introduction to Spatial Data

Why do we care-III Decisions …Example drawn from Gary Eilert’s presentation to the House Foreign Affairs committee, Oct 2, 2009

How should we plan for El Niño in the Horn of Africa?Outcome: 140 million dollars in aid shipped ~6 months earlier than usual

Source: USGS EWX

Poor rains, March-May 2009 Poor rains, June-July 2009Plan for a bigger problem. Rains have already been poor in most of 2009

Page 8: Overview of Statistical Analysis of Spatial Data Geog 210Cchris/Lecture1_210C_Spring2011... · Overview of Statistical Analysis of Spatial Data Geog 210C Introduction to Spatial Data

Why do we care-III Decisions …Understand that it’s not only El Niño; climate change is also present

Last 4 rainy seasons are worst ever.

Main season rainfall decreasing, while second season is increasing

Almost 20% drop in main season rainfall since 1980

C. Funk/USGS

Page 9: Overview of Statistical Analysis of Spatial Data Geog 210Cchris/Lecture1_210C_Spring2011... · Overview of Statistical Analysis of Spatial Data Geog 210C Introduction to Spatial Data

Why do we care-III Decisions …

Average 2000-2010 MAMJ SPI

Ethiopian and Kenyan MAMJ SPI

-0.6

0

0.6

1950 1960 1970 1980 1990 2000 2010

Year

SPI s

moo

thed

with

runn

ning

10-

year

mea

n

Ethiopia MAMJ SPI Kenya MAMJ SPI

Semi-arid food insecure areas experiencing increased dryness

Page 10: Overview of Statistical Analysis of Spatial Data Geog 210Cchris/Lecture1_210C_Spring2011... · Overview of Statistical Analysis of Spatial Data Geog 210C Introduction to Spatial Data

Outcome = Early and Effective Response

FEWSNET—reports below normal rains …. USAID and U.N. agencies are taking action to ensure that sufficient stocks are in place, with USAID recently committing an additional $70 million.

USAID Frontlines, Feb. 2010

Page 11: Overview of Statistical Analysis of Spatial Data Geog 210Cchris/Lecture1_210C_Spring2011... · Overview of Statistical Analysis of Spatial Data Geog 210C Introduction to Spatial Data

Belousov–Zhabotinsky Reaction

Non-equilibrium thermodynamical reaction

C. Funk Geog 210C Spring 201011

A

B

C

Page 12: Overview of Statistical Analysis of Spatial Data Geog 210Cchris/Lecture1_210C_Spring2011... · Overview of Statistical Analysis of Spatial Data Geog 210C Introduction to Spatial Data

Why Geostatistics?

We only have one planet, so let’s get to know her Statistics evolved in the 17th century as a tool for help guide nascent states

The combination of statistics, computers, and the modern firehose of data might be able to guide us through the 21st centuryIncreasing resource demand and shortages will require more sophisticated management

The spatialness of data is importantPlace mattersNearness mattersAccuracy mattersUncertainty mattersTheory matters 1

2

John Graunt’s book Natural and Political Observations Made upon the Bills of Mortality (1662) analyzed mortality rolls in early London to warn of bubonic plagueData are not passive!

Page 13: Overview of Statistical Analysis of Spatial Data Geog 210Cchris/Lecture1_210C_Spring2011... · Overview of Statistical Analysis of Spatial Data Geog 210C Introduction to Spatial Data

13

Overview-Labs

LabsWk Lab

DateLabs

1 3/30 Descriptive Univariate Statistics

2 4/6 Statistical Sampling 3 4/13 Intensity Analysis of Spatial Point Patterns 4 4/20 Interaction Analysis of Spatial Point Patterns

Spatial Point Patterns & CSR5 4/27 Quantifying Spatial Association (SA) in Scattered Data

6 5/04 Nonlinear Least Squares for SA Model Fitting

7 5/11 Spatial Prediction via Simple Kriging 8 5/18 Geostatistics in ArcMap

Related zip archive with data 9 5/25 Principal Component Analysis

Page 14: Overview of Statistical Analysis of Spatial Data Geog 210Cchris/Lecture1_210C_Spring2011... · Overview of Statistical Analysis of Spatial Data Geog 210C Introduction to Spatial Data

14

Overview-Lectures

LecturesWeek Lectures

1a1b

Overview of Statistical Analysis of Spatial DataUnivariate Sample Statistics

2a2b

Intensity Analysis of Spatial Point PatternsUnivariate Random Variables

3a3b

Interaction Analysis of Spatial Point PatternsDescriptive Statistics (Bivariate & Multivariate)

4a4b

Point Patterns & Complete Spatial Randomness – IPoint Patterns & Complete Spatial Randomness - II

5a5b

Empirical SemivariogramsElements of Spatial Stochastic Processes I

6a6b

Elements of Spatial Stochastic Processes IIModeling Semivariograms

7a7b

Spatial Interpolation ExampleSimple Kriging

8a8b

Not So Simple KrigingExplaining Covariance

9a9b

Principal Component Analysis-I Principal Component Analysis-II

10a10b

Conditional Simulation – I Conditional Simulation – II

Page 15: Overview of Statistical Analysis of Spatial Data Geog 210Cchris/Lecture1_210C_Spring2011... · Overview of Statistical Analysis of Spatial Data Geog 210C Introduction to Spatial Data

15

Introduction & Objectives

Spatial data:Geo-referenced attribute measurements; each measurement is associated with a location (point) or an entity (region or object) in geographical (or other) space

Attribute measurement scale can be continuous or discrete, e.g., chemical concentration, soil types, disease occurrences Sample locations can have a regular or irregular spatial arrangement i.e., data locations on a raster (regular lattice) or scattered in space; domain informed by a measurement is called the sample unit or support, e.g., points, pixels, polygonsSpatial data often have an additional temporal component; dynamic attribute evolution in space and time, spatiotemporal supportObjectives of this handout

To provide a brief overview of types of spatial dataTo highlight the role of spatial statistics in analyzing data of each type

Page 16: Overview of Statistical Analysis of Spatial Data Geog 210Cchris/Lecture1_210C_Spring2011... · Overview of Statistical Analysis of Spatial Data Geog 210C Introduction to Spatial Data

16

Stages in Spatial Data Analysis

Exploratory analysisExplore spatial data using cartographic (or other visual) representationsstatistical analysis for detecting possible sub-populations, outliers, trends, relationships with neighboring values or other spatial variables

Modeling or confirmatory analysisEstablish parametric or non-parametric model(s) characterizing attribute spatial distributionEstimate model parameters from data; evaluate their statistical signicance;predict attribute values at other locations and/or future time instants

NotesAny processing of spatial data, e.g., filtering or interpolation, affects any inference made from themBoundaries between above stages are not always clear cut

Page 17: Overview of Statistical Analysis of Spatial Data Geog 210Cchris/Lecture1_210C_Spring2011... · Overview of Statistical Analysis of Spatial Data Geog 210C Introduction to Spatial Data

17

Attributes Varying Continuously in Space

CharacteristicsAlso known (unfortunately) as geostatistical data, e.g., temperature, rainfall, elevation, population densityMeasurements of nominal scale, e.g., land cover types, or interval/ratio scale, e.g., sea floor depthOften, sparse samples are available only at fixed set of locations

Page 18: Overview of Statistical Analysis of Spatial Data Geog 210Cchris/Lecture1_210C_Spring2011... · Overview of Statistical Analysis of Spatial Data Geog 210C Introduction to Spatial Data

18

Area or Lattice Data

CharacteristicsAttributes take values only at fixed set of areas or zones, e.g., administrative districts, pixels of satellite images Typically, all possible locations have been sampled; no attribute values between sampling units (unless there are missing values)

Page 19: Overview of Statistical Analysis of Spatial Data Geog 210Cchris/Lecture1_210C_Spring2011... · Overview of Statistical Analysis of Spatial Data Geog 210C Introduction to Spatial Data

C. Funk Geog 210C Spring 2010

19

Point Pattern Data

CharacteristicsSeries of point locations with recorded “events", e.g., locations of trees, disease or crime incidentsPoint locations correspond to all possible events (mapped point pattern), or to a subset (sampled point pattern)Attribute values also possible at same locations, e.g., tree diameter, magnitude of earthquakes (marked point pattern)

Page 20: Overview of Statistical Analysis of Spatial Data Geog 210Cchris/Lecture1_210C_Spring2011... · Overview of Statistical Analysis of Spatial Data Geog 210C Introduction to Spatial Data

20

Spatial Interaction or Network Data

Characteristicsattributes relate to pairs of points or areas: flows from origins to destinations, e.g., patients “flow" from residences to hospitalsLess tangible flows, e.g., information, could be defined

Analysis objectivesModeling of flow patterns = finding relationships between observed flows and explanatory variables, e.g., number of trips from origins to destinations as function of incomeClassical analysis methods focus on patterns of aggregate interaction, rather than individuals themselves; more recent focus is placed on understanding individual preferences and choice modelingSpatial location/allocation problems, and more generally spatial optimization problems, typically involve network data

Page 21: Overview of Statistical Analysis of Spatial Data Geog 210Cchris/Lecture1_210C_Spring2011... · Overview of Statistical Analysis of Spatial Data Geog 210C Introduction to Spatial Data

21

Univariate Statistics and Spatial Pattern?

Two 1D attribute profiles with the same histogram:

Shortcomings of univariate statisticsUnivariate statistics, e.g., average, variance, histogram, do not suffice to describe spatial pattern; the spatial arrangement of attribute values matters too.

Spatial auto-correlation an aspect of spatial patternAttribute values measured at “nearby" supports tend to be more “similar" than those measured at “distant" supports; Tobler's 1st law(?) of Geography

Page 22: Overview of Statistical Analysis of Spatial Data Geog 210Cchris/Lecture1_210C_Spring2011... · Overview of Statistical Analysis of Spatial Data Geog 210C Introduction to Spatial Data

22

Role of Spatial Statistics in Spatial Data Analysis

Spatially continuous dataModel attribute spatial variation over study area from sampled point valuesPredict attribute values at non-sampled locations (accounting for covariates)

Area (lattice) dataDetect and model spatial patterns or trends in area values; no prediction at non-sampled locations, unless smoothing of existing values or imputation of missing values is requiredUse covariates or relationships with adjacent attribute values for inference, e.g., disease rates in light of socioeconomic variables

Point patternsDetect clustering or regularity, as opposed to complete randomness, of event locations in space and/or timeIf clustering is detected, investigate possible relations between clusters and nearby “sources" or pertinent covariates

Page 23: Overview of Statistical Analysis of Spatial Data Geog 210Cchris/Lecture1_210C_Spring2011... · Overview of Statistical Analysis of Spatial Data Geog 210C Introduction to Spatial Data

23

Spatial Versus Non-Spatial Statistics

Classical statisticsSamples assumed realizations of independent and identically distributed random variables (iid)Most hypothesis testing procedures call for samples from iid random variablesProblems with inference and hypothesis testing in a spatial setting

Spatial statisticsMultivariate statistics in a spatial/temporal context: each observation is viewed as a realization from a different random variable, but such random variables are auto-correlated in space and/or timeEach sample is not an independent piece of information, because precisely it is redundant with other samples (due to the corresponding random variables being auto-correlated)Auto- and cross-correlation (in space and/or time) is explicitly accounted for to establish confidence intervals for hypothesis testing

One can always choose to analyze spatial data with non-spatial statistics; problems arise when confidence intervals need to be reported …

Page 24: Overview of Statistical Analysis of Spatial Data Geog 210Cchris/Lecture1_210C_Spring2011... · Overview of Statistical Analysis of Spatial Data Geog 210C Introduction to Spatial Data

24

Software for Statistical Analysis of Spatial Data

GIS-basedESRI's Spatial Analyst, Geostatistical Analyst …opt for “close" or “loose" coupling with specialized external packages when specific functionalities are missing from a GIS

Statistical packagesExtremely versatile in modeling; recent improvements in visualizationR and SpaceStat/GeoDa most popular in Geography

Image processing packagesMature technology, lots of new developmentsIDL and Matlab most popular in Remote Sensing and Electrical Engineering

Access to source code written in a straight-forward programming language is critical for research development in an academic environment …

Page 25: Overview of Statistical Analysis of Spatial Data Geog 210Cchris/Lecture1_210C_Spring2011... · Overview of Statistical Analysis of Spatial Data Geog 210C Introduction to Spatial Data

25

Some Issues Specific to Spatial Data Analysis

A first lookDifferences from times series analysis: 1. irregular sampling 2. lack of clear indexing; no notion of past-present-future 3. auto- and cross-correlation in multiple directions

Multi-source data associated with different spatial/temporal resolutionsData often reported as aggregates over arbitrarily dened zones/areas; statistics of aggregates are not the same as those of individuals:

1. Modifiable Area Unit Problem (MAUP)2. Ecological Fallacy or Inference Problem (EIP)

Edge/boundary effects: samples near the edges of a study region have fewer neighbors than samples in the interior; near-edge samples might bear the effects of different spatial processesSpatial process models typically distinguish between rst- and second-order effects, i.e., between environmental controls and interactions (distinction between the two not always clear-cut)

Page 26: Overview of Statistical Analysis of Spatial Data Geog 210Cchris/Lecture1_210C_Spring2011... · Overview of Statistical Analysis of Spatial Data Geog 210C Introduction to Spatial Data

26

Modifiable Area-Unit Problem: Aggregation Effect

Two spatial variables and their univariate/bivariate statistics

Statistics and relationships between spatial attributes depend on aggregation extent

Page 27: Overview of Statistical Analysis of Spatial Data Geog 210Cchris/Lecture1_210C_Spring2011... · Overview of Statistical Analysis of Spatial Data Geog 210C Introduction to Spatial Data

27

Modifiable Area-Unit Problem: Zonation Effect

Upscaling spatial variables using two different aggregation schemes

For a given aggregation extent, statistics and relationships between spatial attributes depend on which individual values are aggregated and how

Page 28: Overview of Statistical Analysis of Spatial Data Geog 210Cchris/Lecture1_210C_Spring2011... · Overview of Statistical Analysis of Spatial Data Geog 210C Introduction to Spatial Data

28

Ecological Inference Problem I

Downscaling spatial variables

Statistics and relationships between spatial variables at a finer spatial resolution are different than those derived at the original coarse resolution

Averages do not apply to individuals, since they are not homogeneous

Page 29: Overview of Statistical Analysis of Spatial Data Geog 210Cchris/Lecture1_210C_Spring2011... · Overview of Statistical Analysis of Spatial Data Geog 210C Introduction to Spatial Data

29

Ecological Inference Problem II

Under-determined inverse problem

Multiple combinations of fine spatial resolution attribute values can lead to the same aggregate values at a coarser resolution (equi-finality)

Page 30: Overview of Statistical Analysis of Spatial Data Geog 210Cchris/Lecture1_210C_Spring2011... · Overview of Statistical Analysis of Spatial Data Geog 210C Introduction to Spatial Data

30

First- Versus Second-Order Effects

First-order effectsSpatial pattern explained by environmental (or extrinsic) factors, e.g., attribute value y(x) is high at location x due to another attribute value y’(x) at the same location x, or another attribute value y’(x’) at a nearby location x’Second-order effectsSpatial pattern explained by interaction (or intrinsic) factors, e.g., attribute value y(x) is low at location x due to another (same-attribute) value y(x’) at a nearby location x’, provided both locations x and x’ lie in the same “environment"

Page 31: Overview of Statistical Analysis of Spatial Data Geog 210Cchris/Lecture1_210C_Spring2011... · Overview of Statistical Analysis of Spatial Data Geog 210C Introduction to Spatial Data

31

Recap I

Spatial dataSet of geo-referenced measurements with attribute values and coordinates (topology & context also important)data types:1. spatial point patterns <= events2. data continuously varying in space <= fields3. area or lattice data <= objects4. spatial interaction data <= flows

Spatial data analysis objectivesexploratory analysis: looking for patterns/relationshipsconfirmatory analysis: establishing spatial process models from spatial patterns + model parameter estimation

Page 32: Overview of Statistical Analysis of Spatial Data Geog 210Cchris/Lecture1_210C_Spring2011... · Overview of Statistical Analysis of Spatial Data Geog 210C Introduction to Spatial Data

32

Recap II

Spatial statisticsStatistical framework for analysis and modeling of spatial data: accounts for spatial auto-correlation and scale effects; allows assessing uncertainty in spatial analysis resultsMultivariate statistics tailored to the analysis of spatial data

Issues to be aware ofAny spatial analysis result is tied to a particular observation scale, i.e., to the particular sample support(s); the Modifiable Area Unit Problem (MAUP) Spatial process models typically distinguish between:First-order effects or environmental controlsSecond-order effects or interactions (spatial auto-correlation)

this dichotomy does not apply to actual data, only to data generating models ….