POINT PATTERN ANALYSIS w0279429@nscc - …geomantiacogs.com/.../Asgn2_Point_PatternAnalysis... · FISHNET generation/ modification Python # Determine extents of new Fishnet ... #

This project is a student project and was completed for education purposes. The poster should not be reproduced or distributed in any format.

This project is a student project and was completed for education purposes. The poster should not be reproduced or distributed in any format.DISCLAIMERDISCLAIMER

Venessa BennettVenessa Bennett [email protected]@nscc.caPOINT PATTERN ANALYSISPOINT PATTERN ANALYSIS

Base ocean topographic information presented in Figure 1 is accessed through ArcGIS online services. Nova Scotia Digital Elevation map sourced through http://novascotia.ca/natr/MEB/download/dp055dds.aspBase ocean topographic information presented in Figure 1 is accessed through ArcGIS online services. Nova Scotia Digital Elevation map sourced through http://novascotia.ca/natr/MEB/download/dp055dds.asp

QUADRAT METHOD: PYTHON SCRIPTQUADRAT METHOD: PYTHON SCRIPT

A custom python script was written in PythonWin 2.7.5 (build 219) and dynamically linked to ArcGIS through the addition of a custom Toolbox called Point Pattern Analysis. The Arcpy site package was utilized to write the script to perform the quadrat analysis procedure. ArcPy is used to assist coding in Python tailored for customizing geographic analysis and other data processing and management functions. A schematic diagram summarizing the key components of the script is summarized in Figure 2. Sample outputs for the correctly operating script are provided in Figures 3-5.

VARIANCE&

T STATISTIC

Python# Calculate numerator

quadrats NO DATA

# Calculate numeratorquadrats WITH data(searchcursor; loop)

# Calculate finalVariance

# Calculate T-Statistic

QUADRATSTATISTICS

Python# Intersect quadrats

with points

# Calculate numberof points/quadrat

# Calculatenumber of quadrats

with NO POINTS

# Calculate Lamda

no. pts

no. quadrats

FISHNETgeneration/

modification

Python# Determine extents

of new Fishnet (Origin,

Orientation &Opp corner coords;

Describe &Extent syntax)

# Define/calculatecell/quadrat size

# Modify Fishnet location (optional)

# Clip Fishnet to AOI

QUADRATSIZE

specifications

Python# Apply formula

to determineOPTIMAL number

of quadrats for input data

BASIC DATAOperations +

Initial StatisticGeneration

Python# Display point &

study area polygondatasets

# Calculate numberof points in dataset

#Calculate areaof AOI polygon

Initial SETUPArcGIS - Script

Link

ArcGIS# Create

NEW TOOL(Point Pattern

Analysis)# Set parameters

Python# Initate PythonScript (Quadrat

Method)#Link to new tool

in ArcGIS

QUADRAT METHOD - PYTHON SCRIPT GENERALIZED WORKFLOW

SUMMARY OUTPUT

INTERFACE

Python# Tkinter graphics

to visualize summary results

POINTPATTERN

CLASSIFICATION

Python# Conditional statements

to define random, regular and clustered

data based onspecified t-statistic

values at givensignificance levels

OPERATIONOPERATION

TOOL/LOGIC/

OPERATION

TOOL/LOGIC/

OPERATION

FIGURE 2

ARCGIS RESULTS SUMMARY

Figure 3: ArcGIS 10.2.2 results summary output

Tkinter RESULTS SUMMARY

Figure 4: Tkinter results summary output

Figure 5: User-interface of Quadrat Method script within ArcGIS 10.2.2

ARCGIS USER - INTERFACE

A was written in PythonWin 2.7.5 (build 219) and through the addition of a custom . The Arcpy site package was utilized to write the script to perform the quadrat analysis procedure.

ArcPy is used to assist coding in Python tailored for customizing geographic analysis and other data processing and management functions. A schematic diagram summarizing the key components of the script is summarized in . Sample outputs for the correctly operating script are provided in .

custom python script dynamically linked to ArcGISToolbox called Point Pattern Analysis

Figure 2Figures 3-5

REFERENCESREFERENCESDramowicz, K., (2005). Analyzing Patterns in Business Point Data. Directions Magazine: http://www.directionsmag.com/entry/analyzing-patterns-in-business-point-data/123508

Dramowicz, K., (2005). Analyzing Patterns in Business Point Data. Directions Magazine: http://www.directionsmag.com/entry/analyzing-patterns-in-business-point-data/123508

Mitchell, A. 2009. The Esri Guide to GIS Analysis. Volume 2: Spatial Measurements and Statistics. ESRI Press, 252 pp.

Mitchell, A. 2009. The Esri Guide to GIS Analysis. Volume 2: Spatial Measurements and Statistics. ESRI Press, 252 pp.

*1: http://resources.arcgis.com/en/help/main/10.1/index.html#//005p00000008000000*1: http://resources.arcgis.com/en/help/main/10.1/index.html#//005p00000008000000

POINT PATTERN ANALYSIS: QUADRAT METHODPOINT PATTERN ANALYSIS: QUADRAT METHOD

PHYSICIANS

NN

BANKS POINT DATA

Figure 14

A cubic model (Table 5) provided the best fit to the Physicians point data quadrat method analysis results (Fig. 14). The best fit curve on the scatter plot in Figure 15 intersects the origin and therefore a random distribution will only occur for very small quadrat sizes. For all quadrat sizes > 5 km, the t-statistic is greater than > 1.96 and thus has a clustered pattern. The curve has a concave shape and does not appear to have reach the maximum t-statistic value.

CURVE ESTIMATION RESULTS

Table 5

PHYSICIANS SCATTERPLOT

Figure 15

y=-3.13E3+3.24E3*x++-7.32*x*x+5.77E-3*x*x*x

y = *x*x+5.77E-3*x*x*x

-3.13E3+3.24E3*x++-7.32

POINT PATTERN ANALYSIS: AVERAGE NEAREST NEIGHBOUR METHODPOINT PATTERN ANALYSIS: AVERAGE NEAREST NEIGHBOUR METHOD

The Average Nearest Neighbour method calculates a nearest neighbor index based on the average distance from each feature to its nearest neighboring feature *1. The main steps involved include (Dramowicz, 2005):

(6) Calculation of z-score.

A point pattern is random, when the observed vs. expected distances are similar (z-score between -1.96 and +1.96). When the observed distance is less than the expected, the point pattern is clustered (z-score < -1.96). Finally, when the observed distance is greater than the expected distance, the point pattern is regular (z-score > 1.96). The results of the average neareast neighbour tool are sensitive to the area in which the point data lies. Small changes in area can result in significant changes in the output z-scores. The Average Nearest Neighbour method, is most effective when a fixed study area of known area is used in the calculations. The tool parameters used for the Average nearest neighbour tool in this study are illustrated in Figure 16. The area of Nova Scotia is provided in meters squared. A plot of expected vs. observed for each of the 5 datasets is provided in Figure 17. The graphical results of the Average Nearest Neighbour tool for each point dataset are given in Figures 18 – 22.

(1) Calculation of distance from any point to all points.

(2) Identify the nearest neighbour (minimum distance).

(3) Determine the average minimum distance for the data.

(4) Calculate a mean distance for a theoretical random pattern using the same number of points and same areal extent.(5) Comparison of expected and observed values

Two bar charts are provided in Figures 23 and 24 that display point data category vs. t-statistic and z-score for the Quadrat Method and Average Nearest Neighbour, respectively. Note for the quadrat method data, the t-statistic from the optimal quadrat size is utilized. The two methods both classify the point patterns for banks, dentists, drugstores and physicians as clustered. However, the two techniques yield different results for the point distribution of hospitals. The quadrat method results in a random distribution of hospitals, whereas the average nearest neighbour tool characterizes the point data as dispersed.

COMPARISONCOMPARISON

AVERAGE NEAREST NEIGHBOUR PARAMETERS

Figure 17

OBSERVED vs. EXPECTED BARCHART

Figure 16

BANKS

Figure 18

DENTISTS DRUGSTORES HOSPITALS PHYSICIANS

Figure 19 Figure 20 Figure 21 Figure 22

Figure 23 Figure 24

QUADRAT METHOD AVERAGE NEAREST NEIGHBOUR

PHYSICIANS ( ) provided the best fit to the Physicians point data quadrat method analysis results ( ). The best fit curve on the scatter plot in intersects the origin and therefore a random distribution will only occur for very small quadrat sizes. For all quadrat sizes > 5 km, the t-statistic is greater than > 1.96 and thus has a clustered pattern. The curve has a concave shape and does not appear to have reach the maximum t-statistic value.

A cubic model Table 5Fig. 14

Figure 15

The method calculates a nearest neighbor based on the average distance from each feature to its nearest neighboring feature *1. The main steps involved include (Dramowicz, 2005):Average Nearest Neighbour index

(6) Calculation of .z-score

A point pattern is , when the observed vs. expected distances are similar ( ). When the observed distance is less than the expected, the point pattern is . Finally, when the observed distance is greater than the expected distance, the point pattern is . The results of the average neareast neighbour tool are sensitive to the area in which the point data lies. Small changes in area can result in significant changes in the output z-scores. The Average Nearest Neighbour method, is most effective when a fixed study area of known area is used in the calculations. The tool parameters used for the Average nearest neighbour tool in this study are illustrated in . The area of Nova Scotia is provided in meters squared. A plot of expected vs. observed for each of the 5 datasets is provided in . The graphical results of the Average Nearest Neighbour tool for each point dataset are given in .

random z-score between -1.96 and +1.96 clustered (z-score < -1.96)regular (z-score > 1.96)

Figure 16 Figure 17 Figures 18 – 22

(1) Calculation of from distance any point to all points.

(2) Identify the nearest neighbour ( ).minimum distance

(3) Determine the for the data.average minimum distance

(4) Calculate a pattern using the same number of points and same areal extent.

mean distance for a theoretical random

(5) Comparison of expected and observed values

are provided in that display point data category vs. t-statistic and z-score for the and

, respectively. Note for the quadrat method data, the t-statistic from the optimal quadrat size is utilized. The two methods both classify the point patterns for banks, dentists, drugstores and physicians as clustered. However, the two techniques for the point distribution of . The quadrat method results in a random distribution of hospitals, whereas the average nearest neighbour tool characterizes the point data as dispersed.

Two bar charts

yield different results hospitals

Figures 23 and 24 Quadrat Method Average

Nearest Neighbour

CUBIC

POINT PATTERN ANALYSIS: QUADRAT METHODPOINT PATTERN ANALYSIS: QUADRAT METHOD

This section presents a series of results for 5 different census point datasets (banks, dentists, drugstores, hospitals, physicians). The data were mapped and re-projected to NAD83 UTM zone 20 and quadrat method analysis was carried for five discrete quadrat sizes (5 km, optimal, 60 km, 80 km and 100 km). The results were tabulated and used to examine point pattern behaviour in graphical format. SPSS statistics was used to generate scatterplot of the resultant t-statistic vs. quadrat size. A best – fit curve was finally plotted through the points to visualize how point patterns changes with increasing quadrat size. Results are presented for each point dataset below.

A cubic model (Table 1) provided the best fit to the Banks point data quadrat method analysis results (Fig. 6). The best fit curve on the scatter plot in Figure 7 indicates that at small quadrat sizes (< 5km), the t-statistic approaches zero and will have a random distribution. However, for all quadrat sizes > 5 km, the t-statistic is greater than > 1.96 and thus has a clustered pattern. The shape of the cubic curve indicates that for the range of quadrat sizes between the optimal size and ~ 80 km, the t-statistic increases at a greater rate (i.e. steeper curve gradient), than for quadrat sizes > 80 km, where the gradient of the curve flattens out (but remains positive).

BANKS

NN

BANKS POINT DATA

Figure 6

y = +-3.36E-3*x*x*x

10.79+13.1*x++0.51*x*x

BANKS SCATTERPLOT

Figure 7


Table 1

NN

DENTISTS POINT DATA

Figure 8

A cubic model (Table 2) provided the best fit to the Dentists point data quadrat method analysis results (Fig. 8). The best fit curve on the scatter plot in Figure 9 indicates that at small quadrat sizes (< 5km), the t-statistic approaches zero (random distribution). However, for all quadrat sizes > 5 km, the t-statistic is greater than > 1.96 and, similar to the banks dataset, has a clustered point pattern.

DENTISTS


Table 2Figure 9

1.69E2+36.99*x++1.79*x*x+-0.01*x*x*x

DENTISTS SCATTERPLOT


Table 3

NN

DRUGSTORES POINT DATA

Figure10

A power model (Table 3) provided the best fit to the Drugstores point data quadrat method analysis results (Fig. 10). The best fit curve on the scatter plot in Figure 11 indicates that at small quadrat sizes (< 5km), the t-statistic approaches zero (random distribution). However, for all quadrat sizes > 5 km, the t-statistic, is greater than > 1.96 and has a clustered point pattern. At high quadrat sizes, the t-statistic appears to 'peak' at ~ 80 km and then decrease as quadrat size increases to 100 km.

y = 1.45 * x**1.50

DRUGSTORES SCATTERPLOT

Figure 11

NN

HOSPITALS POINT DATA

Figure12


Table 3

HOSPITALS SCATTERPLOT

Figure 13

A cubic model (Table 4) provided the best fit to the Hospitals point data quadrat method analysis results (Fig. 12). The best fit curve is the most anomalous of the five datasets with a quasi-sinusoidal shape to the cubic model (Fig. 13). The curve represents a best-fit estimate that attempts to account for the significant t-statistic decrease at the 100 km quadrat size. Similar, to the drugstore point data set, the t-statistic value appears to have peaked at 80 km, before declining to lower values. The sinusoidal shape to this best-fit curve is in consequence to these two data points. Additional data at large quadrat sizes (> 80 km and < 100 km) would help to resolve the true shape of the best fit curve. At quadrat sizes, < 60 km, the hospital point pattern data has a random distribution. At quadrat sizes > 60 km, the distribution is clustered.

DRUGSTORES

HOSPITALS

y =3+-0.64*x++0.02*x*x+-9.87E-5*x*x*x

Quadrat analysis involves sampling of input points based on a GIS-generated overlay (fishnet) that subdivides a study area into polygons of equal size (quadrats; Mitchell, 2009). The number of points per quadrat and the frequency of counts are both calculated and from these values the variance of the input points and ultimately the t-statistic can be determined. When a distribution is random (Poisson Distribution), the mean and the variance of the points are equal (Dramowicz, 2005). When the point pattern is clustered, the variance is greater than the mean (Dramowicz, 2005) and when a distribution is regular (uniform), /the variance is smaller than the mean. The t-statistic provides a way to the spatial arrangement of test point patterns. At the 5% significance level, if the t-statistic is > 1.96, the point pattern is classified as clustered. If the t-statistic is < -1.96, the pattern is regular and for all values lying between these end-member values, the point pattern is deemed random.

Quadrat analysis study area into polygons of equal size quadrats

clustered regularrandom

involves sampling of input points based on a GIS-generated overlay ( ) that subdivides a ( ; Mitchell, 2009). The number of points per quadrat and the frequency of counts are both calculated and from these values the of the input points and ultimately the can be determined. When a distribution is random ( ), the mean and the variance of the points are equal (Dramowicz, 2005). When the point pattern is clustered, the variance is greater than the mean (Dramowicz, 2005) and when a distribution is regular (uniform), /the variance is smaller than the mean. The t-statistic provides a way to the spatial arrangement of test point patterns. At the 5% significance level, if the t-statistic is > 1.96, the point pattern is classified as . If the t-statistic is < -1.96, the pattern is and for all values lying between these end-member values, the point pattern is deemed .

fishnetPoisson Distributionvariance t-statistic

This section presents a series of results for 5 different census point datasets ( ). The data were mapped and re-projected to and quadrat method analysis was carried for five discrete quadrat sizes (5 km, optimal, 60 km, 80 km and 100 km). The results were tabulated and used to examine point pattern behaviour in graphical format. SPSS statistics was used to generate scatterplot of the resultant t-statistic vs. quadrat size. A best – fit curve was finally plotted through the points to how . Results are presented for each point dataset below.

banks, dentists, drugstores, hospitals, physicians NAD83 UTM zone 20

point patterns changes with increasing quadrat sizevisualize

A ( ) provided the best fit to the Banks point data quadrat method analysis results ( ). The best fit curve on the scatter plot in indicates that at small quadrat sizes (< 5km), the t-statistic approaches zero and will have a random distribution. However, for all quadrat sizes > 5 km, the t-statistic is greater than > 1.96 and thus has a . The shape of the cubic curve indicates that for the range of quadrat sizes between the optimal size and ~ 80 km, the t-statistic increases at a greater rate (i.e. steeper curve gradient), than for quadrat sizes > 80 km, where the gradient of the curve flattens out (but remains positive).

cubic model Table 1Fig. 6 Figure 7

clustered pattern

A cubic model ( ) provided the best fit to the Dentists point data quadrat method analysis results ( ). The best fit curve on the scatter plot in indicates that at small quadrat sizes (< 5km), the t-statistic approaches zero (random distribution). However, for all quadrat sizes > 5 km, the t-statistic is greater than > 1.96 and, similar to the banks dataset, has a

.

Table 2Fig. 8

Figure 9

clustered point pattern

A ( ) provided the best fit to the Drugstores point data quadrat method analysis results ( ). The best fit curve on the scatter plot in indicates that at small quadrat sizes (< 5km), the t-statistic approaches zero (random distribution). However, for all quadrat sizes > 5 km, the t-statistic, is greater than > 1.96 and has a At high quadrat sizes, the t-statistic appears to 'peak' at ~ 80 km and then decrease as quadrat size increases to 100 km.

power model Table 3Fig. 10

Figure 11

clustered point pattern.

A ( ) provided the best fit to the Hospitals point data quadrat method analysis results ( ). The best fit curve is the most anomalous of the five datasets with a quasi-sinusoidal shape to the cubic model ( ). The curve represents a best-fit estimate that attempts to account for the significant t-statistic decrease at the 100 km quadrat size. Similar, to the drugstore point data set, the t-statistic value appears to have peaked at 80 km, before declining to lower values. The to this best-fit curve is in consequence to these two data points. Additional data at large quadrat sizes (> 80 km and < 100 km) would help to resolve the true shape of the best fit curve. At quadrat sizes, < 60 km, the hospital point pattern data has a random distribution. At quadrat sizes > 60 km, the distribution is

cubic model

sinusoidal shape

Table 4Fig. 12

Fig. 13

clustered.

BANKS

DENTISTS

DRUGSTORES

HOSPITALS

CUBIC

POWER

CUBIC

CUBIC

INTRODUCTIONINTRODUCTION

RANDOM REGULAR CLUSTERED

source: http://gispopsci.org/wp-content/uploads/2013/02/RandUnifClust.png

Figure 1: Point pattern distribution types

RANDOM – any point is equally as likely to occur at any location; the position of any point is not controlled by the position of other points

UNIFORM – Every point is a maximum distance from neighbouring points

CLUSTERED – Several points are concentrated spatially; large areas are devoid of data

Point pattern analysis refers to the evaluation of the spatial arrangement of point datasets, typically in two dimensions. The purpose of the analysis method is to determine (i) if there is a tendency in the dataset to exhibit a systematic pattern over an area (as opposed to a random spatial arrangements) and (ii) over what scale does the pattern manifest. Point spatial distribution can be divided into three groups *1(Fig. 1):

In this study, QUADRAT ANALYSIS and AVERAGE NEAREST NEIGHBOUR point pattern analysis methods are used to characterize how the density of a point pattern varies for five point datasets in the province of Nova Scotia. At its' simplest, point pattern analysis allows for a comparison of the similarity of a dataset with a theoretical randomly distributed dataset of the same number of points and the same areal extent (Dramowicz, 2005). The purpose of this study is to:

1. Use the Quadrat Method script to conduct point pattern analysis on a series of 5 datasets. A core component of the analysis was the creation of a custom point pattern analysis tool that references a python script and utilizes the ArcPy package inbuilt within the ArcGIS environment. The python script (Quadrat Method) is housed custom toolbox (Point Pattern Analysis).

2. Apply the operational tool to analyse the spatial distribution of points for 5 project datasets provided (banks, dentists, drugstores, hospitals and physicians). Several quadrat sizes are compared graphically and a best fit curve is defined illustrating the variation in the t statistic with increasing quadrat size.

3. Use the Average Nearest Neighbour tool to analyse the spatial distribution of the same datasets used for quadrat analysis and carry out a brief comparison of the two point pattern analysis methods.

The quadrat method python script was written in PythonWin32 software which was dynamically linked to the ArcGIS 10.2.2 interface. Subsequent graphical analysis of the quadrat method results was completed in SPSS statistics version 22. All data census data were transformed from the WGS84 datum to the NAD83 datum and re-projected to UTM zone 20.

RANDOM – any point is equally as likely to occur at any location; the position of any point is not controlled by the position of other points

UNIFORM – Every point is a maximum distance from neighbouring points

CLUSTERED – Several points are concentrated spatially; large areas are devoid of data

refers to the evaluation of the of point datasets, typically in two dimensions. The purpose of the analysis method is to determine (i) if there is a tendency in the dataset to exhibit a (as opposed to a random spatial arrangements) and (ii) does the pattern manifest. Point spatial distribution can be divided into three groups *1( ):

Point pattern analysissystematic pattern over an area

Fig. 1

spatial arrangement

over what scale

In this study, and point pattern analysis methods are used to characterize how the density of a point pattern varies for five point datasets in the province of Nova Scotia. At its' simplest, point pattern analysis allows for a of a dataset with a theoretical of the same number of points and the same areal extent (Dramowicz, 2005). The purpose of this study is to:

QUADRAT ANALYSIS AVERAGE NEAREST NEIGHBOUR

comparison of the similarity randomly distributed dataset

1. Use the to conduct point pattern analysis on a series of . A core component of the analysis was the creation of a t and utilizes the ArcPy package inbuilt within the ArcGIS environment. The python script (Quadrat Method) is housed custom toolbox (Point Pattern Analysis).

Quadrat Method scriptcustom point pattern analysis tool that references a python scrip

5 datasets

2. to analyse the spatial distribution of points for 5 project datasets provided (banks, dentists, drugstores, hospitals and physicians). Several quadrat sizes are compared graphically and a is defined illustrating the variation in the t statistic with increasing quadrat size.

Apply the operational toolbest fit curve

3. Use the tool to analyse the spatial distribution of the same datasets used for quadrat analysis and carry out a brief comparison of the two point pattern analysis methods.

Average Nearest Neighbour

The quadrat method python script was written in software which was dynamically linked to the ArcGIS 10.2.2 interface. Subsequent graphical analysis of the quadrat method results was completed in SPSS statistics version 22. All data census data were transformed from the WGS84 datum to the .

PythonWin32

NAD83 datum and re-projected to UTM zone 20

NOTE - The Nova Scotia polygon shape file used in the Point Pattern Analysis INCLUDE islands off mainland Nova Scotia (e.g. Sable Island)NOTE - The Nova Scotia polygon shape file used in the Point Pattern Analysis INCLUDE islands off mainland Nova Scotia (e.g. Sable Island)

Documents

POINT PATTERN ANALYSIS w0279429@nscc - …geomantiacogs.com/.../Asgn2_Point_PatternAnalysis... · FISHNET generation/ modification Python # Determine extents of new Fishnet ... #