71
EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

Embed Size (px)

Citation preview

Page 1: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

EXPLORING SPATIAL CORRELATION IN RIVERS

by Joshua French

Page 2: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

Introduction

A city is required to extends its sewage pipelines farther in its bay to meet EPA requirements.

How far should the pipelines be extended?

The city doesn’t want to spend any more money than it needs to extend the pipelines. It needs to find a way to make predictions for the waste levels at different sites in the bay.

Page 3: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

With the passage of the Clean Water Act in the 1970’s, spatial analysis of aquatic data has become even more important.

Section 305 b) requires state governments to make, “a description of the water quality of all navigable waters in such State. . .”

It is not physically or financially possible to make measurements at all sites. Some sort of spatial interpolation will need to be used.

Page 4: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

Usually we might try to fit some sort of linear model to the data to make predictions. Usually we assume observations are independent.

For spatial data however, we intuitively know that two sampling sites close together will probably be similar.

We would expect that two sites in close proximity would be more similar than two sites separated by a great distance.

We can use the correlation between sampling sites to make better predictions with our model.

Page 5: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

The Ohio River

Page 6: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

The Road Ahead

- Methods- Introduction to the Variogram- Exploratory Analysis- Sample Variogram- Modeling the Variogram

- Analysis- 3 types of results

- Conclusions- Future Work

Page 7: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

Introduction to the Variogram

Spatial data is often viewed as a stochastic process.

For each point x, a specific property Z(x) is viewed as a random variable with mean µ, variance σ2, higher-order moments, and a cumulative distribution function.

Page 8: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

Each individual Z(xi) is assumed to have its own distribution, and the set {Z(x1),Z(x2),…} is a stochastic process.

The data values in a given data set are simply a realization of the stochastic process.

Page 9: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

We want to measure the relationship between different points. Define the covariance for Z(xj) and Z(xk) to be:

Cov(Z(xj),Z(xk))=E[{Z(xj)-µ(xj)} {Z(xk)-µ(xk)}]

where µ(xj) and µ(xk) is the mean of Z at each respective location.

Page 10: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

However, we have a problem. We don’t know the means at each point because we only have one realization.

To solve this, we must assume sort of stationarity – certain features of the distribution are identical everywhere.

We will work with data that satisfies second-order stationarity.

Page 11: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

Second-order stationarity means that the mean is the same everywhere: i.e. E[Z(xj)]=µ for all points xj.

It also implies that Cov(Z(xj),Z(xk)) becomes a function of the distance xj to xk.

Page 12: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

Thus,

Cov(Z(xj),Z(xk)) = Cov(Z(x),Z(x+h))

= Cov(h)

where h measures the distance between two points.

We can then derive that

Cov(Z(x),Z(x+h)) =E[(Z(x)-µ)(Z(x+h)- µ)]

= E[(Z(x)(Z(x+h))-µ2]

Page 13: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

Sometimes it is clear that our data is not second-order stationary.

Georges Matheron solved this problem in 1965 by establishing his “intrinisic hypothesis”.

For small distances h, Matheron held that

E[Z(x)-Z(x+h)]=0

Page 14: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

Looking at the variance of differences, this

leads to

Var[Z(x)-Z(x+h)] =E[ (Z(x)-Z(x+h))2 ]

= 2 γ(h)

Intrinsic stationarity is good because analysis may be conducted even if second-order stationarity is violated. Unfortunately, the covariance equation is not defined for intrinsic stationarity.

Page 15: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

For this reason, we will work with data that is second-order stationarity. If second-order stationarity is violated by the original data, then we will perform additional procedures to work with data that is second-order stationary.

Page 16: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

Note that second-order stationarity implies intrinsic stationarity, so the variogram equation is still defined.

Under second-order stationarity, γ(h)=Cov(0)-Cov(h).

γ(h) is known as the semi-variogram. In practice however, it is usually referred to as the variogram.

Page 17: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

Things to know about variograms:

1. γ(h)= γ(-h). Because it is an even function, usually only positive lag distances are shown.

2. Nugget effect - by definition, γ(0)= 0. In practice however, sample variograms often have a positive value at lag 0. This is called the “nugget effect”.

Page 18: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

3. Tend to increase monotonically

4. Sill – the maximum variance of the variogram

5. Range – the lag distance at which the sill is reached

The following figure shows these features

Page 19: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

Variogram Example

Lag Distance

Var

ianc

e

0 1 2 3 4 5

0.0

0.5

1.0

1.5

sill

nugget

range

Page 20: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

Exploratory Analysis

Before we model variograms, we should explore the data.

We need to make sure that the data analyzed satisfies second-order stationarity

We need to check for outliers

We need to make sure that the data is not too badly skewed (G1>1)

Page 21: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

We can look at the river data as a one-dimensional linear system. It is fairly easy to check for stationarity using a scatter plot.

RMI

Squ

are

Roo

t of P

erce

nt In

vert

ivor

e

0 200 400 600 800 1000

02

46

810

Page 22: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

If there is an obvious trend in the data, we should remove it and analyze the residuals.

If the variance increases or decreases with lag distance, then we should transform the variable to correct this.

Page 23: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

To check for outliers, we may use a typical boxplot.

If the data contains outliers, we should do analysis both with and without outliers present.

020

4060

80

Page 24: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

If G1>1, then we should transform the data to approximate normality if possible. To check approximate normality, the standard qqplot can be used.

Quantiles of Standard Normal

Obs

erve

d

-2 0 2

23

45

67

8

Page 25: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

3.3 The Sample Variogram

One of the previous definitions of semivariance is:

The logical estimator is:

where N(h) is the number of pairs of observations associated with that lag.

].) )Z()Z( ( [ E2

1)γ( 2hxxh

N(h)

1j

2jj ] )z(x)z(x [

)2N(

1)(γ h

hhˆ

Page 26: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

Sample Variogram Example

Lag Distance

Va

ria

nce

0 20 40 60 80

20

00

04

00

00

60

00

08

00

00

10

00

00

Page 27: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

Modeling the Variogram

Our goal is to estimate the true variogram of the data.

There were four variogram models used to model the sample variogram: the spherical, Gaussian, exponential, and Matern models.

Page 28: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

Variogram Models

Lag Distance

Va

ria

nce

0 1 2 3 4 5 6

0.0

0.2

0.4

0.6

0.8

1.0

ExponentialSphericalGaussianMatern

Page 29: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

The algorithm used to fit the spherical model uses least squares.

The algorithm used to fit the exponential, Gaussian, and Matern models is maximum likelihood.

The spherical model is fit to get an estimate of the sill, nugget, and range.

Page 30: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

These estimates will be used to fit the other three models.

The “best model” will be the model that minimizes the AICC statistic.

Page 31: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

Analysis

The data analyzed is a set of particle size and biological variables for the Ohio River.

The data was collected by The Ohio River Valley Sanitation Commission. This is better known as ORSANCO.

Page 32: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

ORANSCO data collection

Page 33: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

There were between 190 and 235 unique sampling sites, depending on the variable.

Some sites had more than one observation. In these situations, the average value for the site was used for analysis.

Page 34: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

Ohio River Sampling Sites

Longitude (NAD27)

La

titu

de

(N

AD

27

)

-88 -86 -84 -82 -80

37

38

39

40

Pittsburgh, PA

Cairo, IL

Cincinnati, OH

Louisville, KY

Page 35: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

There were two main types of data: particle size data and biological levels.

The particle size data measured percent gravel, percent sand, percent fines, percent hardpan, percent boulder, and percent cobble.

Page 36: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

The biological data measured- Number of individuals at a site- Number of species at a site- Percent tolerant fish- Percent simple lithophilic fish (fish that lay eggs

on rocks)- Percent non-native fish- Percent detritivore fish (fish that eat mostly

decomposed plants or animals)- Percent invertivore (fish that eat mostly

invertebrate animals)- Percent Piscivore (fish that eat mostly other fish)

Page 37: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

The results of the analysis fell into three main groups:

- Sample variogram fit well

- Sample variogram did not fit well

- Analysis not reasonable

Page 38: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

Good Results: Number of Individuals at a site

Skewness coefficient of data is 8.16. This is much too high.

The data is transformed using the natural logarithm

New skewness coefficient is reduced to .56. Not perfect, but much less skewed.

Page 39: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

Check Normality of log(Num Individuals)

Quantiles of Standard Normal

log

(Nu

mb

er

of

Ind

ivid

ua

ls)

-3 -2 -1 0 1 2 3

45

67

8

Page 40: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

Check Second-Order Stationarity of log(Num Individuals)

RMI

log

(Nu

mb

er

of

Ind

ivid

ua

ls)

0 200 400 600 800 1000

45

67

8

Page 41: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

Check for outliers of log(Num Individuals)4

56

78

Page 42: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

There are a number of outliers for the transformed variable

We should do analysis with and without the outliers present

Page 43: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

log(Num Individuals) Sample Variogramwith outliers

Lag Distance (Mi)

Va

ria

nce

0 50 100 150 200 250

0.3

00

.35

0.4

00

.45

0.5

00

.55

Page 44: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

Check normality of log(Num Individuals) without outliers

Quantiles of Standard Normal

log

(Num

In

div

idu

als

) w

/o o

utli

ers

-3 -2 -1 0 1 2 3

4.0

4.5

5.0

5.5

6.0

6.5

Page 45: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

log(Num Individuals) Sample Variogramwithout outliers

Lag Distance (Mi)

Va

ria

nce

0 50 100 150 200 250

0.2

00

.25

0.3

0

Page 46: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

We were not able to model the sample variogram perfectly, but we were able to detect some amount of spatial correlation in the data, especially when the outliers were removed.

For the transformed variable without outliers, the exponential model estimated the nugget to be .20, the sill to be .2709, and the range to be 37.7 miles.

Page 47: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

Poor Results: Percent Sand

Skewness coefficient only .18, so skewness not a major factor.

Check second-order stationarity using scatter plot.

Page 48: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

Check Stationarity of Percent Sand

RMI

Pe

rce

nt

Sa

nd

0 200 400 600 800 1000

02

04

06

08

01

00

Page 49: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

There appears to be a trend in the data.

After removing the trend, the data appears to be second-order stationary.

The residuals are also approximately normal.

Page 50: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

Check stationarity of percent sand residuals

RMI

Pe

rce

nt

Sa

nd

Re

sid

ua

ls

0 200 400 600 800 1000

-60

-40

-20

02

04

0

Page 51: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

Check normality of percent sand residuals

Quantiles of Standard Normal

san

d$

resi

d

-3 -2 -1 0 1 2 3

-60

-40

-20

02

04

0

Page 52: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

Sample Variogram of percent sand residuals

Lag Distance (Mi)

Va

ria

nce

0 50 100 150 200 250

40

04

50

50

05

50

60

06

50

70

0

Page 53: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

The sample variogram does not really increase monotonically with distance.

Our variogram models cannot fit this very well.

Though we can obtain estimates of the nugget, sill, and range, the estimates cannot be trusted.

Page 54: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

No results: Percent Hardpan

This variable was so badly skewed that analysis was not reasonable.

The skewness coefficient is 12.38. This is extremely high.

Page 55: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

QQplot of Percent Hardpan

Quantiles of Standard Normal

Pe

rce

nt

Ha

rd P

an

-3 -2 -1 0 1 2 3

05

01

00

15

02

00

25

0

Page 56: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

Scatter plot of Percent Hardpan

RMI

Pe

rce

nt

Ha

rd P

an

0 200 400 600 800 1000

05

01

00

15

02

00

25

0

Page 57: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

The data is nearly all zeros!

There is also an erroneous data value. A percentage cannot be greater than 100%.

Data analysis does not seem reasonable. Our data does not meet the conditions necessary to use the spatial methods discussed.

Page 58: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

Conclusions

Able to fit sample variogram reasonably well – percent gravel, number of individuals, number of species

Not able to fit sample variogram well – percent sand, percent detritivore, percent simple lithophilic individuals, percent invertivore

No results – remaining variables

Page 59: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

Summary of ResultsResponse Transformation Trend Removed Model Nugget Sill Range

Percent Gravel Exponential 286.09 335.53 72.9 milesPercent Sand 38.1082+.0330x Gaussian 520.88 658.32 71.67 miles

Percent CobblePercent Hardpan

Percent FinesPercent Boulder

Number of Individuals Natural Log Gaussian 0.29 0.39 44.19 milesNumber of Individuals Natural Log (no outliers) Exponential 0.2 0.27 37.69 miles

Number of Native Species 17.7849-.0042x Gaussian 10.1 11.87 39.93 milesPercent Tolerant Individuals

Percent Lithophilic Individuals Square Root 15.5364-.0023x Matern 0.92 2.76 44.02 milesPercent Nonnative Individuals

Percent Detritivore Square Root Exponential 1.09 1.57 24.08 milesPercent Detritivore Square Root (no outliers) Exponential 0.94 1.4 19.17 milesPercent Invertivore Square Root 6.5207-.0039x Exponential 1.4 2.97 13.43 milesPercent Piscivore

Page 60: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

Future Work

Data set involving three streams in Norfolk, Virginia. Each stream has 25 observations. Collected by researchers at Old Dominion University.

Difficulties to overcome - What is the best way to measure distance between points? - Few observations - Overlapping points after coordinate conversion

Page 61: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

Problem: What is the best way to measure distance between points?

There is some aspect of two-dimensionality to the data, but it is still really a one-dimensional problem.

Page 62: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

Paradise Creek Region of Interest

Page 63: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

Paradise Creek Sampling Sites

UTMX

UT

MY

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4

0.0

0.2

0.4

0.6

Page 64: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

Problem: 25 observations per stream is considered the minimum number of points to create a variogram

- the sample variogram will be very rough

- our variogram model estimates will probably be bad

To correct this, we will explore the possibility of combining the data from the three streams

Page 65: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

Problem: Overlapping points after conversion

- Original data in longitude/latitude coordinates

- Convert to UTM coordinates so that Euclidian distance makes sense

- Converted UTM coordinates often result in overlapping sites (and even fewer unique sampling sites)

Page 66: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

Stream Sampling Sites (Lat/Long)

Longitude (NAD27)

La

titu

de

(N

AD

27

)

-76.290 -76.288 -76.286 -76.284 -76.282

36

.80

03

6.8

02

36

.80

43

6.8

06

36

.80

8

Page 67: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

Stream Sampling Sites (UTM)

UTM X

UT

M Y

920200 920400 920600 920800 921000

40

82

80

04

08

32

00

40

83

60

0

Page 68: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

Stream Sampling Sites (Lat/Long)

Longitude (NAD27)

La

titu

de

(N

AD

27

)

-76.290 -76.288 -76.286 -76.284 -76.282

36

.80

03

6.8

02

36

.80

43

6.8

06

36

.80

8

Page 69: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

Stream Sampling Sites (UTM)

UTM X

UT

M Y

920200 920400 920600 920800 921000

40

82

80

04

08

32

00

40

83

60

0

Page 70: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

Acknowledgments

- My committee: Dr. Urquhart, Dr. Wang, and Dr. Theobald

- Dr. Davis and Dr. Reich for answering my spatial questions and letting me use their S-Plus spatial library

Page 71: EXPLORING SPATIAL CORRELATION IN RIVERS by Joshua French

Concluding Thought

Before you criticize someone, you should walk a mile in their shoes. That way, when you criticize them, you’re a mile away and you have their shoes.

- Jack Handey