40
EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

EXPLORING SPATIAL CORRELATION IN RIVERS

Embed Size (px)

DESCRIPTION

EXPLORING SPATIAL CORRELATION IN RIVERS. by Joshua French. Introduction. A city is required to extends its sewage pipelines farther in its bay to meet EPA requirements. How far should the pipelines be extended? - PowerPoint PPT Presentation

Citation preview

Page 1: EXPLORING SPATIAL CORRELATION IN RIVERS

EXPLORING SPATIAL CORRELATION IN RIVERS

by Joshua French

Page 2: EXPLORING SPATIAL CORRELATION IN RIVERS

Introduction

A city is required to extends its sewage pipelines farther in its bay to meet EPA requirements.

How far should the pipelines be extended?

The city doesn’t want to spend any more money than it needs to extend the pipelines. It needs to find a way to make predictions for the waste levels at different sites in the bay.

Page 3: EXPLORING SPATIAL CORRELATION IN RIVERS

Usually we might try to interpolate the data using a linear model. Usually we assume observations are independent.

For spatial data however, we intuitively know that response values for points close together should be more similar than points separated by a great distance.

We can use the correlation between sampling sites to make better predictions with our model.

Page 4: EXPLORING SPATIAL CORRELATION IN RIVERS

The Road Ahead

- Methods- Introduction to the Variogram- Exploratory Analysis- Sample Variogram- Modeling the Variogram

- Analysis- 3 types of results

- Conclusions- Future Work

Page 5: EXPLORING SPATIAL CORRELATION IN RIVERS

Introduction to the Variogram

Spatial data is often viewed as a stochastic process.

For each point x, a specific property Z(x) is viewed as a random variable with mean µ, variance σ2, higher-order moments, and a cumulative distribution function.

Page 6: EXPLORING SPATIAL CORRELATION IN RIVERS

Each individual Z(xi) is assumed to have its own distribution, and the set {Z(x1),Z(x2),…} is a stochastic process.

The data values in a given data set are simply a realization of the stochastic process.

For a spatial process, second-order stationarity is often assumed.

Page 7: EXPLORING SPATIAL CORRELATION IN RIVERS

Second-order stationarity implies that the mean is the same everywhere: i.e. E[Z(xj)]=µ for all points xj.

It also implies that Cov(Z(xj),Z(xk)) becomes a function of the distance xj to xk.

Page 8: EXPLORING SPATIAL CORRELATION IN RIVERS

Thus,

Cov(Z(xj),Z(xk)) = Cov(Z(x),Z(x+h))

= Cov(h)

where h measures the distance between two points.

Page 9: EXPLORING SPATIAL CORRELATION IN RIVERS

Looking at the variance of differencesVar[Z(x)-Z(x+h)] =E[ (Z(x)-Z(x+h))2 ]

= 2 γ(h)

Assuming second-order stationarity, γ(h)=Cov(0)-Cov(h).

γ(h) is known as the semi-variogram.

The plot of γ(h) on h is known as the variogram.

Page 10: EXPLORING SPATIAL CORRELATION IN RIVERS

Things to know about variograms:

1. γ(h)= γ(-h). Because it is an even function, usually only positive lag distances are shown.

2. Nugget effect - by definition, γ(0)= 0. In practice however, sample variograms often have a positive value at lag 0. This is called the “nugget effect”.

Page 11: EXPLORING SPATIAL CORRELATION IN RIVERS

3. Tend to increase monotonically

4. Sill – the maximum variance of the variogram

5. Range – the lag distance at which the sill is reached. Observations are not correlated past this distance.

The following figure shows these features

Page 12: EXPLORING SPATIAL CORRELATION IN RIVERS

Variogram Example

Lag Distance

Var

ianc

e

0 1 2 3 4 5

0.0

0.5

1.0

1.5

sill

nugget

range

Page 13: EXPLORING SPATIAL CORRELATION IN RIVERS

Exploratory Analysis

The data studied is the longitudinal profile of the Ohio River.

Instead of worrying about the river network with streams, tributaries, and other factors, we simply look at the Ohio River as a one-dimensional object.

Page 14: EXPLORING SPATIAL CORRELATION IN RIVERS

The Ohio River

Page 15: EXPLORING SPATIAL CORRELATION IN RIVERS

Longitudinal Profile of the Ohio River Sampling Sites

Longitude (NAD27)

La

titu

de

(N

AD

27

)

-88 -86 -84 -82 -80

37

38

39

40

Pittsburgh, PA

Cairo, IL

Cincinnati, OH

Louisville, KY

Page 16: EXPLORING SPATIAL CORRELATION IN RIVERS

Before we model variograms, we should explore the data.

We need to make sure that the data analyzed satisfies second-order stationarity

If there is an obvious trend in the data, we should remove it and analyze the residuals.

If the variance increases or decreases with lag distance, then we should transform the variable to correct this.

Page 17: EXPLORING SPATIAL CORRELATION IN RIVERS

It is fairly easy to check for stationarity of this data set using a scatter plot.

RMI

Squ

are

Roo

t of P

erce

nt In

vert

ivor

e

0 200 400 600 800 1000

02

46

810

Page 18: EXPLORING SPATIAL CORRELATION IN RIVERS

If the data contains outliers, we should do analysis both with and without outliers present.

If G1>1, then we should transform the data to approximate normality if possible.

Page 19: EXPLORING SPATIAL CORRELATION IN RIVERS

3.3 The Sample Variogram

One of the previous definitions of semivariance is:

The logical estimator is:

where N(h) is the number of pairs of observations associated with that lag.

].) )Z()Z( ( [ E2

1)γ( 2hxxh

N(h)

1j

2jj ] )z(x)z(x [

)2N(

1)(γ h

hhˆ

Page 20: EXPLORING SPATIAL CORRELATION IN RIVERS

Sample Variogram Example

Lag Distance

Va

ria

nce

0 20 40 60 80

20

00

04

00

00

60

00

08

00

00

10

00

00

Page 21: EXPLORING SPATIAL CORRELATION IN RIVERS

Modeling the Variogram

Our goal is to estimate the true variogram of the data.

There were four variogram models used to model the sample variogram: the spherical, Gaussian, exponential, and Matern models.

Page 22: EXPLORING SPATIAL CORRELATION IN RIVERS

Variogram Models

Lag Distance

Va

ria

nce

0 1 2 3 4 5 6

0.0

0.2

0.4

0.6

0.8

1.0

ExponentialSphericalGaussianMatern

Page 23: EXPLORING SPATIAL CORRELATION IN RIVERS

Analysis

The data analyzed is a set of particle size and biological variables for the Ohio River.

The data was collected by “The Ohio River Valley Sanitation Commission. This is better known as ORSANCO.

There were between 190 and 235 unique sampling sites, depending on the variable.

Page 24: EXPLORING SPATIAL CORRELATION IN RIVERS

ORSANCO data collection

Page 25: EXPLORING SPATIAL CORRELATION IN RIVERS

The results of the analysis fell into three main groups:

- Able to fit the sample variogram well

- Not able to fit the sample variogram well

- Analysis not reasonable

Page 26: EXPLORING SPATIAL CORRELATION IN RIVERS

Good Results: Number of Individuals at a site

After correcting for skewness by doing a log transformation, there are a number of outliers. We analyze the data both with and without the outliers.

Page 27: EXPLORING SPATIAL CORRELATION IN RIVERS

log(Num Individuals) Sample Variogramwith outliers

Lag Distance (Mi)

Va

ria

nce

0 50 100 150 200 250

0.3

00

.35

0.4

00

.45

0.5

00

.55

Page 28: EXPLORING SPATIAL CORRELATION IN RIVERS

log(Num Individuals) Sample Variogramwithout outliers

Lag Distance (Mi)

Va

ria

nce

0 50 100 150 200 250

0.2

00

.25

0.3

0

Page 29: EXPLORING SPATIAL CORRELATION IN RIVERS

We were not able to model the sample variogram perfectly, but we were able to detect some amount of spatial correlation in the data, especially when the outliers were removed.

We are able to obtain reasonable estimates of the nugget, sill, and variance.

Page 30: EXPLORING SPATIAL CORRELATION IN RIVERS

Poor Results: Percent Sand

After doing exploratory spatial analysis and removing a trend, we fit the sample variogram of the percent sand residuals.

Page 31: EXPLORING SPATIAL CORRELATION IN RIVERS

Sample Variogram of percent sand residuals

Lag Distance (Mi)

Va

ria

nce

0 50 100 150 200 250

40

04

50

50

05

50

60

06

50

70

0

Page 32: EXPLORING SPATIAL CORRELATION IN RIVERS

The sample variogram does not really increase monotonically with distance.

Our variogram models cannot fit this very well.

Though we can obtain estimates of the nugget, sill, and range, the estimates cannot be trusted.

Page 33: EXPLORING SPATIAL CORRELATION IN RIVERS

No results: Percent Hardpan

This variable was so badly skewed that analysis was not reasonable.

The skewness coefficient is 12.38. This is extremely high.

Page 34: EXPLORING SPATIAL CORRELATION IN RIVERS

QQplot of Percent Hardpan

Quantiles of Standard Normal

Pe

rce

nt

Ha

rd P

an

-3 -2 -1 0 1 2 3

05

01

00

15

02

00

25

0

Page 35: EXPLORING SPATIAL CORRELATION IN RIVERS

Scatter plot of Percent Hardpan

RMI

Pe

rce

nt

Ha

rd P

an

0 200 400 600 800 1000

05

01

00

15

02

00

25

0

Page 36: EXPLORING SPATIAL CORRELATION IN RIVERS

The data is nearly all zeros!

There is also an erroneous data value. A percentage cannot be greater than 100%.

Data analysis does not seem reasonable. Our data does not meet the conditions necessary to use the spatial methods discussed.

Page 37: EXPLORING SPATIAL CORRELATION IN RIVERS

Conclusions

Able to fit sample variogram reasonably well

– percent gravel, number of individuals, number of species

Not able to fit sample variogram well

– percent sand, percent detritivore, percent simple lithophilic individuals, percent invertivore

No results – remaining variables

Page 38: EXPLORING SPATIAL CORRELATION IN RIVERS

Summary of ResultsResponse Transformation Trend Removed Model Nugget Sill Range

Percent Gravel Exponential 286.09 335.53 72.9 milesPercent Sand 38.1082+.0330x Gaussian 520.88 658.32 71.67 miles

Percent CobblePercent Hardpan

Percent FinesPercent Boulder

Number of Individuals Natural Log Gaussian 0.29 0.39 44.19 milesNumber of Individuals Natural Log (no outliers) Exponential 0.2 0.27 37.69 miles

Number of Native Species 17.7849-.0042x Gaussian 10.1 11.87 39.93 milesPercent Tolerant Individuals

Percent Lithophilic Individuals Square Root 15.5364-.0023x Matern 0.92 2.76 44.02 milesPercent Nonnative Individuals

Percent Detritivore Square Root Exponential 1.09 1.57 24.08 milesPercent Detritivore Square Root (no outliers) Exponential 0.94 1.4 19.17 milesPercent Invertivore Square Root 6.5207-.0039x Exponential 1.4 2.97 13.43 milesPercent Piscivore

Page 39: EXPLORING SPATIAL CORRELATION IN RIVERS

Future Work

Things to consider in future analysis:

- The water flows in only one-direction. A point downstream cannot affect a point upstream

- Natural features such as tributaries may impact spatial correlation

- Manmade features such as dams may impact spatial correlation

Page 40: EXPLORING SPATIAL CORRELATION IN RIVERS

Concluding Thought

Before you criticize someone, you should walk a mile in their shoes. That way, when you criticize them, you’re a mile away and you have their shoes.

- Jack Handey