1
This project is a student project and was completed for education purposes. The poster should not be reproduced or distributed in any format. This project is a student project and was completed for education purposes. The poster should not be reproduced or distributed in any format. DISCLAIMER DISCLAIMER Venessa Bennett Venessa Bennett [email protected] [email protected] POINT PATTERN ANALYSIS POINT PATTERN ANALYSIS Base ocean topographic information presented in Figure 1 is accessed through ArcGIS online services. Nova Scotia Digital Elevation map sourced through http://novascotia.ca/natr/MEB/download/dp055dds.asp Base ocean topographic information presented in Figure 1 is accessed through ArcGIS online services. Nova Scotia Digital Elevation map sourced through http://novascotia.ca/natr/MEB/download/dp055dds.asp QUADRAT METHOD: PYTHON SCRIPT QUADRAT METHOD: PYTHON SCRIPT A custom python script was written in PythonWin 2.7.5 (build 219) and dynamically linked to ArcGIS through the addition of a custom Toolbox called Point Pattern Analysis. The Arcpy site package was utilized to write the script to perform the quadrat analysis procedure. ArcPy is used to assist coding in Python tailored for customizing geographic analysis and other data processing and management functions. A schematic diagram summarizing the key components of the script is summarized in Figure 2. Sample outputs for the correctly operating script are provided in Figures 3-5. VARIANCE & T STATISTIC Python # Calculate numerator quadrats NO DATA # Calculate numerator quadrats WITH data (searchcursor; loop) # Calculate final Variance # Calculate T-Statistic QUADRAT STATISTICS Python # Intersect quadrats with points # Calculate number of points/quadrat # Calculate number of quadrats with NO POINTS # Calculate Lamda no. pts no. quadrats FISHNET generation/ modification Python # Determine extents of new Fishnet (Origin, Orientation & Opp corner coords; Describe & Extent syntax) # Define/calculate cell/quadrat size # Modify Fishnet location (optional) # Clip Fishnet to AOI QUADRAT SIZE specifications Python # Apply formula to determine OPTIMAL number of quadrats for input data BASIC DATA Operations + Initial Statistic Generation Python # Display point & study area polygon datasets # Calculate number of points in dataset #Calculate area of AOI polygon Initial SETUP ArcGIS - Script Link ArcGIS # Create NEW TOOL (Point Pattern Analysis) # Set parameters Python # Initate Python Script (Quadrat Method) #Link to new tool in ArcGIS QUADRAT METHOD - PYTHON SCRIPT GENERALIZED WORKFLOW SUMMARY OUTPUT INTERFACE Python # Tkinter graphics to visualize summary results POINT PATTERN CLASSIFICATION Python # Conditional statements to define random, regular and clustered data based on specified t-statistic values at given significance levels OPERATION OPERATION TOOL/ LOGIC/ OPERATION TOOL/ LOGIC/ OPERATION FIGURE 2 ARCGIS RESULTS SUMMARY Figure 3: ArcGIS 10.2.2 results summary output Tkinter RESULTS SUMMARY Figure 4: Tkinter results summary output Figure 5: User-interface of Quadrat Method script within ArcGIS 10.2.2 ARCGIS USER - INTERFACE A was written in PythonWin 2.7.5 (build 219) and through the addition of a custom . The Arcpy site package was utilized to write the script to perform the quadrat analysis procedure. ArcPy is used to assist coding in Python tailored for customizing geographic analysis and other data processing and management functions. A schematic diagram summarizing the key components of the script is summarized in . Sample outputs for the correctly operating script are provided in . custom python script dynamically linked to ArcGIS Toolbox called Point Pattern Analysis Figure 2 Figures 3-5 REFERENCES REFERENCES Dramowicz, K., (2005). Analyzing Patterns in Business Point Data. Directions Magazine: http://www.directionsmag.com /entry/analyzing-patterns-in-business-point-data/123508 Dramowicz, K., (2005). Analyzing Patterns in Business Point Data. Directions Magazine: http://www.directionsmag.com /entry/analyzing-patterns-in-business-point-data/123508 Mitchell, A. 2009. The Esri Guide to GIS Analysis. Volume 2: Spatial Measurements and Statistics. ESRI Press, 252 pp. Mitchell, A. 2009. The Esri Guide to GIS Analysis. Volume 2: Spatial Measurements and Statistics. ESRI Press, 252 pp. *1: http://resources.arcgis.com/en/help/main/10.1/index. html#//005p00000008000000 *1: http://resources.arcgis.com/en/help/main/10.1/index. html#//005p00000008000000 POINT PATTERN ANALYSIS: QUADRAT METHOD POINT PATTERN ANALYSIS: QUADRAT METHOD PHYSICIANS N N BANKS POINT DATA Figure 14 A cubic model (Table 5) provided the best fit to the Physicians point data quadrat method analysis results (Fig. 14). The best fit curve on the scatter plot in Figure 15 intersects the origin and therefore a random distribution will only occur for very small quadrat sizes. For all quadrat sizes > 5 km, the t-statistic is greater than > 1.96 and thus has a clustered pattern. The curve has a concave shape and does not appear to have reach the maximum t-statistic value. CURVE ESTIMATION RESULTS Table 5 PHYSICIANS SCATTERPLOT Figure 15 y=-3.13E3+3.24E3*x++-7.32*x*x+5.77E-3*x*x*x y = *x*x+5.77E-3*x*x*x -3.13E3+3.24E3*x++-7.32 POINT PATTERN ANALYSIS: AVERAGE NEAREST NEIGHBOUR METHOD POINT PATTERN ANALYSIS: AVERAGE NEAREST NEIGHBOUR METHOD The Average Nearest Neighbour method calculates a nearest neighbor index based on the average distance from each feature to its nearest neighboring feature *1. The main steps involved include (Dramowicz, 2005): (6) Calculation of z-score. A point pattern is random, when the observed vs. expected distances are similar (z-score between -1.96 and +1.96). When the observed distance is less than the expected, the point pattern is clustered (z-score < -1.96). Finally, when the observed distance is greater than the expected distance, the point pattern is regular (z-score > 1.96). The results of the average neareast neighbour tool are sensitive to the area in which the point data lies. Small changes in area can result in significant changes in the output z-scores. The Average Nearest Neighbour method, is most effective when a fixed study area of known area is used in the calculations. The tool parameters used for the Average nearest neighbour tool in this study are illustrated in Figure 16. The area of Nova Scotia is provided in meters squared. A plot of expected vs. observed for each of the 5 datasets is provided in Figure 17. The graphical results of the Average Nearest Neighbour tool for each point dataset are given in Figures 18 – 22. (1) Calculation of distance from any point to all points. (2) Identify the nearest neighbour (minimum distance). (3) Determine the average minimum distance for the data. (4) Calculate a mean distance for a theoretical random pattern using the same number of points and same areal extent. (5) Comparison of expected and observed values Two bar charts are provided in Figures 23 and 24 that display point data category vs. t-statistic and z-score for the Quadrat Method and Average Nearest Neighbour, respectively. Note for the quadrat method data, the t-statistic from the optimal quadrat size is utilized. The two methods both classify the point patterns for banks, dentists, drugstores and physicians as clustered. However, the two techniques yield different results for the point distribution of hospitals. The quadrat method results in a random distribution of hospitals, whereas the average nearest neighbour tool characterizes the point data as dispersed. COMPARISON COMPARISON AVERAGE NEAREST NEIGHBOUR PARAMETERS Figure 17 OBSERVED vs. EXPECTED BARCHART Figure 16 BANKS Figure 18 DENTISTS DRUGSTORES HOSPITALS PHYSICIANS Figure 19 Figure 20 Figure 21 Figure 22 Figure 23 Figure 24 QUADRAT METHOD AVERAGE NEAREST NEIGHBOUR PHYSICIANS ( ) provided the best fit to the Physicians point data quadrat method analysis results ( ). The best fit curve on the scatter plot in intersects the origin and therefore a random distribution will only occur for very small quadrat sizes. For all quadrat sizes > 5 km, the t-statistic is greater than > 1.96 and thus has a clustered pattern. The curve has a concave shape and does not appear to have reach the maximum t-statistic value. A cubic model Table 5 Fig. 14 Figure 15 The method calculates a nearest neighbor based on the average distance from each feature to its nearest neighboring feature *1. The main steps involved include (Dramowicz, 2005): Average Nearest Neighbour index (6) Calculation of . z-score A point pattern is , when the observed vs. expected distances are similar ( ). When the observed distance is less than the expected, the point pattern is . Finally, when the observed distance is greater than the expected distance, the point pattern is . The results of the average neareast neighbour tool are sensitive to the area in which the point data lies. Small changes in area can result in significant changes in the output z-scores. The Average Nearest Neighbour method, is most effective when a fixed study area of known area is used in the calculations. The tool parameters used for the Average nearest neighbour tool in this study are illustrated in . The area of Nova Scotia is provided in meters squared. A plot of expected vs. observed for each of the 5 datasets is provided in . The graphical results of the Average Nearest Neighbour tool for each point dataset are given in . random z-score between -1.96 and +1.96 clustered (z-score < -1.96) regular (z-score > 1.96) Figure 16 Figure 17 Figures 18 – 22 (1) Calculation of from distance any point to all points. (2) Identify the nearest neighbour ( ). minimum distance (3) Determine the for the data. average minimum distance (4) Calculate a pattern using the same number of points and same areal extent. mean distance for a theoretical random (5) Comparison of expected and observed values are provided in that display point data category vs. t-statistic and z-score for the and , respectively. Note for the quadrat method data, the t-statistic from the optimal quadrat size is utilized. The two methods both classify the point patterns for banks, dentists, drugstores and physicians as clustered. However, the two techniques for the point distribution of . The quadrat method results in a random distribution of hospitals, whereas the average nearest neighbour tool characterizes the point data as dispersed. Two bar charts yield different results hospitals Figures 23 and 24 Quadrat Method Average Nearest Neighbour CUBIC POINT PATTERN ANALYSIS: QUADRAT METHOD POINT PATTERN ANALYSIS: QUADRAT METHOD This section presents a series of results for 5 different census point datasets (banks, dentists, drugstores, hospitals, physicians). The data were mapped and re-projected to NAD83 UTM zone 20 and quadrat method analysis was carried for five discrete quadrat sizes (5 km, optimal, 60 km, 80 km and 100 km). The results were tabulated and used to examine point pattern behaviour in graphical format. SPSS statistics was used to generate scatterplot of the resultant t-statistic vs. quadrat size. A best – fit curve was finally plotted through the points to visualize how point patterns changes with increasing quadrat size. Results are presented for each point dataset below. A cubic model (Table 1) provided the best fit to the Banks point data quadrat method analysis results (Fig. 6). The best fit curve on the scatter plot in Figure 7 indicates that at small quadrat sizes (< 5km), the t-statistic approaches zero and will have a random distribution. However, for all quadrat sizes > 5 km, the t-statistic is greater than > 1.96 and thus has a clustered pattern. The shape of the cubic curve indicates that for the range of quadrat sizes between the optimal size and ~ 80 km, the t-statistic increases at a greater rate (i.e. steeper curve gradient), than for quadrat sizes > 80 km, where the gradient of the curve flattens out (but remains positive). BANKS N N BANKS POINT DATA Figure 6 y = +-3.36E-3*x*x*x 10.79+13.1*x++0.51*x*x BANKS SCATTERPLOT Figure 7 CURVE ESTIMATION RESULTS Table 1 N N DENTISTS POINT DATA Figure 8 A cubic model (Table 2) provided the best fit to the Dentists point data quadrat method analysis results (Fig. 8). The best fit curve on the scatter plot in Figure 9 indicates that at small quadrat sizes (< 5km), the t-statistic approaches zero (random distribution). However, for all quadrat sizes > 5 km, the t-statistic is greater than > 1.96 and, similar to the banks dataset, has a clustered point pattern. DENTISTS CURVE ESTIMATION RESULTS Table 2 Figure 9 1.69E2+36.99*x++1.79*x*x+ -0.01*x*x*x DENTISTS SCATTERPLOT CURVE ESTIMATION RESULTS Table 3 N N DRUGSTORES POINT DATA Figure10 A power model (Table 3) provided the best fit to the Drugstores point data quadrat method analysis results (Fig. 10). The best fit curve on the scatter plot in Figure 11 indicates that at small quadrat sizes (< 5km), the t-statistic approaches zero (random distribution). However, for all quadrat sizes > 5 km, the t-statistic, is greater than > 1.96 and has a clustered point pattern. At high quadrat sizes, the t-statistic appears to 'peak' at ~ 80 km and then decrease as quadrat size increases to 100 km. y = 1.45 * x**1.50 DRUGSTORES SCATTERPLOT Figure 11 N N HOSPITALS POINT DATA Figure12 CURVE ESTIMATION RESULTS Table 3 HOSPITALS SCATTERPLOT Figure 13 A cubic model (Table 4) provided the best fit to the Hospitals point data quadrat method analysis results (Fig. 12). The best fit curve is the most anomalous of the five datasets with a quasi-sinusoidal shape to the cubic model (Fig. 13). The curve represents a best-fit estimate that attempts to account for the significant t-statistic decrease at the 100 km quadrat size. Similar, to the drugstore point data set, the t-statistic value appears to have peaked at 80 km, before declining to lower values. The sinusoidal shape to this best-fit curve is in consequence to these two data points. Additional data at large quadrat sizes (> 80 km and < 100 km) would help to resolve the true shape of the best fit curve. At quadrat sizes, < 60 km, the hospital point pattern data has a random distribution. At quadrat sizes > 60 km, the distribution is clustered. DRUGSTORES HOSPITALS y =3+-0.64*x++0.02 *x*x+-9.87E-5*x*x*x Quadrat analysis involves sampling of input points based on a GIS-generated overlay (fishnet) that subdivides a study area into polygons of equal size (quadrats; Mitchell, 2009). The number of points per quadrat and the frequency of counts are both calculated and from these values the variance of the input points and ultimately the t-statistic can be determined. When a distribution is random (Poisson Distribution), the mean and the variance of the points are equal (Dramowicz, 2005). When the point pattern is clustered, the variance is greater than the mean (Dramowicz, 2005) and when a distribution is regular (uniform), /the variance is smaller than the mean. The t-statistic provides a way to the spatial arrangement of test point patterns. At the 5% significance level, if the t-statistic is > 1.96, the point pattern is classified as clustered. If the t-statistic is < -1.96, the pattern is regular and for all values lying between these end-member values, the point pattern is deemed random. Quadrat analysis study area into polygons of equal size quadrats clustered regular random involves sampling of input points based on a GIS-generated overlay ( ) that subdivides a ( ; Mitchell, 2009). The number of points per quadrat and the frequency of counts are both calculated and from these values the of the input points and ultimately the can be determined. When a distribution is random ( ), the mean and the variance of the points are equal (Dramowicz, 2005). When the point pattern is clustered, the variance is greater than the mean (Dramowicz, 2005) and when a distribution is regular (uniform), /the variance is smaller than the mean. The t-statistic provides a way to the spatial arrangement of test point patterns. At the 5% significance level, if the t-statistic is > 1.96, the point pattern is classified as . If the t-statistic is < -1.96, the pattern is and for all values lying between these end-member values, the point pattern is deemed . fishnet Poisson Distribution variance t-statistic This section presents a series of results for 5 different census point datasets ( ). The data were mapped and re-projected to and quadrat method analysis was carried for five discrete quadrat sizes (5 km, optimal, 60 km, 80 km and 100 km). The results were tabulated and used to examine point pattern behaviour in graphical format. SPSS statistics was used to generate scatterplot of the resultant t-statistic vs. quadrat size. A best – fit curve was finally plotted through the points to how . Results are presented for each point dataset below. banks, dentists, drugstores, hospitals, physicians NAD83 UTM zone 20 point patterns changes with increasing quadrat size visualize A ( ) provided the best fit to the Banks point data quadrat method analysis results ( ). The best fit curve on the scatter plot in indicates that at small quadrat sizes (< 5km), the t-statistic approaches zero and will have a random distribution. However, for all quadrat sizes > 5 km, the t-statistic is greater than > 1.96 and thus has a . The shape of the cubic curve indicates that for the range of quadrat sizes between the optimal size and ~ 80 km, the t-statistic increases at a greater rate (i.e. steeper curve gradient), than for quadrat sizes > 80 km, where the gradient of the curve flattens out (but remains positive). cubic model Table 1 Fig. 6 Figure 7 clustered pattern A cubic model ( ) provided the best fit to the Dentists point data quadrat method analysis results ( ). The best fit curve on the scatter plot in indicates that at small quadrat sizes (< 5km), the t-statistic approaches zero (random distribution). However, for all quadrat sizes > 5 km, the t-statistic is greater than > 1.96 and, similar to the banks dataset, has a . Table 2 Fig. 8 Figure 9 clustered point pattern A ( ) provided the best fit to the Drugstores point data quadrat method analysis results ( ). The best fit curve on the scatter plot in indicates that at small quadrat sizes (< 5km), the t-statistic approaches zero (random distribution). However, for all quadrat sizes > 5 km, the t-statistic, is greater than > 1.96 and has a At high quadrat sizes, the t-statistic appears to 'peak' at ~ 80 km and then decrease as quadrat size increases to 100 km. power model Table 3 Fig. 10 Figure 11 clustered point pattern. A ( ) provided the best fit to the Hospitals point data quadrat method analysis results ( ). The best fit curve is the most anomalous of the five datasets with a quasi-sinusoidal shape to the cubic model ( ). The curve represents a best-fit estimate that attempts to account for the significant t-statistic decrease at the 100 km quadrat size. Similar, to the drugstore point data set, the t-statistic value appears to have peaked at 80 km, before declining to lower values. The to this best-fit curve is in consequence to these two data points. Additional data at large quadrat sizes (> 80 km and < 100 km) would help to resolve the true shape of the best fit curve. At quadrat sizes, < 60 km, the hospital point pattern data has a random distribution. At quadrat sizes > 60 km, the distribution is cubic model sinusoidal shape Table 4 Fig. 12 Fig. 13 clustered. BANKS DENTISTS DRUGSTORES HOSPITALS CUBIC POWER CUBIC CUBIC INTRODUCTION INTRODUCTION RANDOM REGULAR CLUSTERED source: http://gispopsci.org/wp-content/uploads/2013/02/RandUnifClust.png Figure 1: Point pattern distribution types RANDOM – any point is equally as likely to occur at any location; the position of any point is not controlled by the position of other points UNIFORM – Every point is a maximum distance from neighbouring points CLUSTERED – Several points are concentrated spatially; large areas are devoid of data Point pattern analysis refers to the evaluation of the spatial arrangement of point datasets, typically in two dimensions. The purpose of the analysis method is to determine (i) if there is a tendency in the dataset to exhibit a systematic pattern over an area (as opposed to a random spatial arrangements) and (ii) over what scale does the pattern manifest. Point spatial distribution can be divided into three groups *1(Fig. 1): In this study, QUADRAT ANALYSIS and AVERAGE NEAREST NEIGHBOUR point pattern analysis methods are used to characterize how the density of a point pattern varies for five point datasets in the province of Nova Scotia. At its' simplest, point pattern analysis allows for a comparison of the similarity of a dataset with a theoretical randomly distributed dataset of the same number of points and the same areal extent (Dramowicz, 2005). The purpose of this study is to: 1. Use the Quadrat Method script to conduct point pattern analysis on a series of 5 datasets. A core component of the analysis was the creation of a custom point pattern analysis tool that references a python script and utilizes the ArcPy package inbuilt within the ArcGIS environment. The python script (Quadrat Method) is housed custom toolbox (Point Pattern Analysis). 2. Apply the operational tool to analyse the spatial distribution of points for 5 project datasets provided (banks, dentists, drugstores, hospitals and physicians). Several quadrat sizes are compared graphically and a best fit curve is defined illustrating the variation in the t statistic with increasing quadrat size. 3. Use the Average Nearest Neighbour tool to analyse the spatial distribution of the same datasets used for quadrat analysis and carry out a brief comparison of the two point pattern analysis methods. The quadrat method python script was written in PythonWin32 software which was dynamically linked to the ArcGIS 10.2.2 interface. Subsequent graphical analysis of the quadrat method results was completed in SPSS statistics version 22. All data census data were transformed from the WGS84 datum to the NAD83 datum and re-projected to UTM zone 20. RANDOM – any point is equally as likely to occur at any location; the position of any point is not controlled by the position of other points UNIFORM – Every point is a maximum distance from neighbouring points CLUSTERED – Several points are concentrated spatially; large areas are devoid of data refers to the evaluation of the of point datasets, typically in two dimensions. The purpose of the analysis method is to determine (i) if there is a tendency in the dataset to exhibit a (as opposed to a random spatial arrangements) and (ii) does the pattern manifest. Point spatial distribution can be divided into three groups *1( ): Point pattern analysis systematic pattern over an area Fig. 1 spatial arrangement over what scale In this study, and point pattern analysis methods are used to characterize how the density of a point pattern varies for five point datasets in the province of Nova Scotia. At its' simplest, point pattern analysis allows for a of a dataset with a theoretical of the same number of points and the same areal extent (Dramowicz, 2005). The purpose of this study is to: QUADRAT ANALYSIS AVERAGE NEAREST NEIGHBOUR comparison of the similarity randomly distributed dataset 1. Use the to conduct point pattern analysis on a series of . A core component of the analysis was the creation of a t and utilizes the ArcPy package inbuilt within the ArcGIS environment. The python script (Quadrat Method) is housed custom toolbox (Point Pattern Analysis). Quadrat Method script custom point pattern analysis tool that references a python scrip 5 datasets 2. to analyse the spatial distribution of points for 5 project datasets provided (banks, dentists, drugstores, hospitals and physicians). Several quadrat sizes are compared graphically and a is defined illustrating the variation in the t statistic with increasing quadrat size. Apply the operational tool best fit curve 3. Use the tool to analyse the spatial distribution of the same datasets used for quadrat analysis and carry out a brief comparison of the two point pattern analysis methods. Average Nearest Neighbour The quadrat method python script was written in software which was dynamically linked to the ArcGIS 10.2.2 interface. Subsequent graphical analysis of the quadrat method results was completed in SPSS statistics version 22. All data census data were transformed from the WGS84 datum to the . PythonWin32 NAD83 datum and re-projected to UTM zone 20 NOTE - The Nova Scotia polygon shape file used in the Point Pattern Analysis INCLUDE islands off mainland Nova Scotia (e.g. Sable Island) NOTE - The Nova Scotia polygon shape file used in the Point Pattern Analysis INCLUDE islands off mainland Nova Scotia (e.g. Sable Island)

POINT PATTERN ANALYSIS w0279429@nscc - …geomantiacogs.com/.../Asgn2_Point_PatternAnalysis... · FISHNET generation/ modification Python # Determine extents of new Fishnet ... #

  • Upload
    hathien

  • View
    228

  • Download
    3

Embed Size (px)

Citation preview

Page 1: POINT PATTERN ANALYSIS w0279429@nscc - …geomantiacogs.com/.../Asgn2_Point_PatternAnalysis... · FISHNET generation/ modification Python # Determine extents of new Fishnet ... #

This project is a student project and was completed for education purposes. The poster should not be reproduced or distributed in any format.

This project is a student project and was completed for education purposes. The poster should not be reproduced or distributed in any format.DISCLAIMERDISCLAIMER

Venessa BennettVenessa Bennett [email protected]@nscc.caPOINT PATTERN ANALYSISPOINT PATTERN ANALYSIS

Base ocean topographic information presented in Figure 1 is accessed through ArcGIS online services. Nova Scotia Digital Elevation map sourced through http://novascotia.ca/natr/MEB/download/dp055dds.aspBase ocean topographic information presented in Figure 1 is accessed through ArcGIS online services. Nova Scotia Digital Elevation map sourced through http://novascotia.ca/natr/MEB/download/dp055dds.asp

QUADRAT METHOD: PYTHON SCRIPTQUADRAT METHOD: PYTHON SCRIPT

A custom python script was written in PythonWin 2.7.5 (build 219) and dynamically linked to ArcGIS through the addition of a custom Toolbox called Point Pattern Analysis. The Arcpy site package was utilized to write the script to perform the quadrat analysis procedure. ArcPy is used to assist coding in Python tailored for customizing geographic analysis and other data processing and management functions. A schematic diagram summarizing the key components of the script is summarized in Figure 2. Sample outputs for the correctly operating script are provided in Figures 3-5.

VARIANCE&

T STATISTIC

Python# Calculate numerator

quadrats NO DATA

# Calculate numeratorquadrats WITH data(searchcursor; loop)

# Calculate finalVariance

# Calculate T-Statistic

QUADRATSTATISTICS

Python# Intersect quadrats

with points

# Calculate numberof points/quadrat

# Calculatenumber of quadrats

with NO POINTS

# Calculate Lamda

no. pts

no. quadrats

FISHNETgeneration/

modification

Python# Determine extents

of new Fishnet (Origin,

Orientation &Opp corner coords;

Describe &Extent syntax)

# Define/calculatecell/quadrat size

# Modify Fishnet location (optional)

# Clip Fishnet to AOI

QUADRATSIZE

specifications

Python# Apply formula

to determineOPTIMAL number

of quadrats for input data

BASIC DATAOperations +

Initial StatisticGeneration

Python# Display point &

study area polygondatasets

# Calculate numberof points in dataset

#Calculate areaof AOI polygon

Initial SETUPArcGIS - Script

Link

ArcGIS# Create

NEW TOOL(Point Pattern

Analysis)# Set parameters

Python# Initate PythonScript (Quadrat

Method)#Link to new tool

in ArcGIS

QUADRAT METHOD - PYTHON SCRIPT GENERALIZED WORKFLOW

SUMMARY OUTPUT

INTERFACE

Python# Tkinter graphics

to visualize summary results

POINTPATTERN

CLASSIFICATION

Python# Conditional statements

to define random, regular and clustered

data based onspecified t-statistic

values at givensignificance levels

OPERATIONOPERATION

TOOL/LOGIC/

OPERATION

TOOL/LOGIC/

OPERATION

FIGURE 2

ARCGIS RESULTS SUMMARY

Figure 3: ArcGIS 10.2.2 results summary output

Tkinter RESULTS SUMMARY

Figure 4: Tkinter results summary output

Figure 5: User-interface of Quadrat Method script within ArcGIS 10.2.2

ARCGIS USER - INTERFACE

A was written in PythonWin 2.7.5 (build 219) and through the addition of a custom . The Arcpy site package was utilized to write the script to perform the quadrat analysis procedure.

ArcPy is used to assist coding in Python tailored for customizing geographic analysis and other data processing and management functions. A schematic diagram summarizing the key components of the script is summarized in . Sample outputs for the correctly operating script are provided in .

custom python script dynamically linked to ArcGISToolbox called Point Pattern Analysis

Figure 2Figures 3-5

REFERENCESREFERENCESDramowicz, K., (2005). Analyzing Patterns in Business Point Data. Directions Magazine: http://www.directionsmag.com/entry/analyzing-patterns-in-business-point-data/123508

Dramowicz, K., (2005). Analyzing Patterns in Business Point Data. Directions Magazine: http://www.directionsmag.com/entry/analyzing-patterns-in-business-point-data/123508

Mitchell, A. 2009. The Esri Guide to GIS Analysis. Volume 2: Spatial Measurements and Statistics. ESRI Press, 252 pp.

Mitchell, A. 2009. The Esri Guide to GIS Analysis. Volume 2: Spatial Measurements and Statistics. ESRI Press, 252 pp.

*1: http://resources.arcgis.com/en/help/main/10.1/index.html#//005p00000008000000*1: http://resources.arcgis.com/en/help/main/10.1/index.html#//005p00000008000000

POINT PATTERN ANALYSIS: QUADRAT METHODPOINT PATTERN ANALYSIS: QUADRAT METHOD

PHYSICIANS

NN

BANKS POINT DATA

Figure 14

A cubic model (Table 5) provided the best fit to the Physicians point data quadrat method analysis results (Fig. 14). The best fit curve on the scatter plot in Figure 15 intersects the origin and therefore a random distribution will only occur for very small quadrat sizes. For all quadrat sizes > 5 km, the t-statistic is greater than > 1.96 and thus has a clustered pattern. The curve has a concave shape and does not appear to have reach the maximum t-statistic value.

CURVE ESTIMATION RESULTS

Table 5

PHYSICIANS SCATTERPLOT

Figure 15

y=-3.13E3+3.24E3*x++-7.32*x*x+5.77E-3*x*x*x

y = *x*x+5.77E-3*x*x*x

-3.13E3+3.24E3*x++-7.32

POINT PATTERN ANALYSIS: AVERAGE NEAREST NEIGHBOUR METHODPOINT PATTERN ANALYSIS: AVERAGE NEAREST NEIGHBOUR METHOD

The Average Nearest Neighbour method calculates a nearest neighbor index based on the average distance from each feature to its nearest neighboring feature *1. The main steps involved include (Dramowicz, 2005):

(6) Calculation of z-score.

A point pattern is random, when the observed vs. expected distances are similar (z-score between -1.96 and +1.96). When the observed distance is less than the expected, the point pattern is clustered (z-score < -1.96). Finally, when the observed distance is greater than the expected distance, the point pattern is regular (z-score > 1.96). The results of the average neareast neighbour tool are sensitive to the area in which the point data lies. Small changes in area can result in significant changes in the output z-scores. The Average Nearest Neighbour method, is most effective when a fixed study area of known area is used in the calculations. The tool parameters used for the Average nearest neighbour tool in this study are illustrated in Figure 16. The area of Nova Scotia is provided in meters squared. A plot of expected vs. observed for each of the 5 datasets is provided in Figure 17. The graphical results of the Average Nearest Neighbour tool for each point dataset are given in Figures 18 – 22.

(1) Calculation of distance from any point to all points.

(2) Identify the nearest neighbour (minimum distance).

(3) Determine the average minimum distance for the data.

(4) Calculate a mean distance for a theoretical random pattern using the same number of points and same areal extent.(5) Comparison of expected and observed values

Two bar charts are provided in Figures 23 and 24 that display point data category vs. t-statistic and z-score for the Quadrat Method and Average Nearest Neighbour, respectively. Note for the quadrat method data, the t-statistic from the optimal quadrat size is utilized. The two methods both classify the point patterns for banks, dentists, drugstores and physicians as clustered. However, the two techniques yield different results for the point distribution of hospitals. The quadrat method results in a random distribution of hospitals, whereas the average nearest neighbour tool characterizes the point data as dispersed.

COMPARISONCOMPARISON

AVERAGE NEAREST NEIGHBOUR PARAMETERS

Figure 17

OBSERVED vs. EXPECTED BARCHART

Figure 16

BANKS

Figure 18

DENTISTS DRUGSTORES HOSPITALS PHYSICIANS

Figure 19 Figure 20 Figure 21 Figure 22

Figure 23 Figure 24

QUADRAT METHOD AVERAGE NEAREST NEIGHBOUR

PHYSICIANS ( ) provided the best fit to the Physicians point data quadrat method analysis results ( ). The best fit curve on the scatter plot in intersects the origin and therefore a random distribution will only occur for very small quadrat sizes. For all quadrat sizes > 5 km, the t-statistic is greater than > 1.96 and thus has a clustered pattern. The curve has a concave shape and does not appear to have reach the maximum t-statistic value.

A cubic model Table 5Fig. 14

Figure 15

The method calculates a nearest neighbor based on the average distance from each feature to its nearest neighboring feature *1. The main steps involved include (Dramowicz, 2005):Average Nearest Neighbour index

(6) Calculation of .z-score

A point pattern is , when the observed vs. expected distances are similar ( ). When the observed distance is less than the expected, the point pattern is . Finally, when the observed distance is greater than the expected distance, the point pattern is . The results of the average neareast neighbour tool are sensitive to the area in which the point data lies. Small changes in area can result in significant changes in the output z-scores. The Average Nearest Neighbour method, is most effective when a fixed study area of known area is used in the calculations. The tool parameters used for the Average nearest neighbour tool in this study are illustrated in . The area of Nova Scotia is provided in meters squared. A plot of expected vs. observed for each of the 5 datasets is provided in . The graphical results of the Average Nearest Neighbour tool for each point dataset are given in .

random z-score between -1.96 and +1.96 clustered (z-score < -1.96)regular (z-score > 1.96)

Figure 16 Figure 17 Figures 18 – 22

(1) Calculation of from distance any point to all points.

(2) Identify the nearest neighbour ( ).minimum distance

(3) Determine the for the data.average minimum distance

(4) Calculate a pattern using the same number of points and same areal extent.

mean distance for a theoretical random

(5) Comparison of expected and observed values

are provided in that display point data category vs. t-statistic and z-score for the and

, respectively. Note for the quadrat method data, the t-statistic from the optimal quadrat size is utilized. The two methods both classify the point patterns for banks, dentists, drugstores and physicians as clustered. However, the two techniques for the point distribution of . The quadrat method results in a random distribution of hospitals, whereas the average nearest neighbour tool characterizes the point data as dispersed.

Two bar charts

yield different results hospitals

Figures 23 and 24 Quadrat Method Average

Nearest Neighbour

CUBIC

POINT PATTERN ANALYSIS: QUADRAT METHODPOINT PATTERN ANALYSIS: QUADRAT METHOD

This section presents a series of results for 5 different census point datasets (banks, dentists, drugstores, hospitals, physicians). The data were mapped and re-projected to NAD83 UTM zone 20 and quadrat method analysis was carried for five discrete quadrat sizes (5 km, optimal, 60 km, 80 km and 100 km). The results were tabulated and used to examine point pattern behaviour in graphical format. SPSS statistics was used to generate scatterplot of the resultant t-statistic vs. quadrat size. A best – fit curve was finally plotted through the points to visualize how point patterns changes with increasing quadrat size. Results are presented for each point dataset below.

A cubic model (Table 1) provided the best fit to the Banks point data quadrat method analysis results (Fig. 6). The best fit curve on the scatter plot in Figure 7 indicates that at small quadrat sizes (< 5km), the t-statistic approaches zero and will have a random distribution. However, for all quadrat sizes > 5 km, the t-statistic is greater than > 1.96 and thus has a clustered pattern. The shape of the cubic curve indicates that for the range of quadrat sizes between the optimal size and ~ 80 km, the t-statistic increases at a greater rate (i.e. steeper curve gradient), than for quadrat sizes > 80 km, where the gradient of the curve flattens out (but remains positive).

BANKS

NN

BANKS POINT DATA

Figure 6

y = +-3.36E-3*x*x*x

10.79+13.1*x++0.51*x*x

BANKS SCATTERPLOT

Figure 7

CURVE ESTIMATION RESULTS

Table 1

NN

DENTISTS POINT DATA

Figure 8

A cubic model (Table 2) provided the best fit to the Dentists point data quadrat method analysis results (Fig. 8). The best fit curve on the scatter plot in Figure 9 indicates that at small quadrat sizes (< 5km), the t-statistic approaches zero (random distribution). However, for all quadrat sizes > 5 km, the t-statistic is greater than > 1.96 and, similar to the banks dataset, has a clustered point pattern.

DENTISTS

CURVE ESTIMATION RESULTS

Table 2Figure 9

1.69E2+36.99*x++1.79*x*x+-0.01*x*x*x

DENTISTS SCATTERPLOT

CURVE ESTIMATION RESULTS

Table 3

NN

DRUGSTORES POINT DATA

Figure10

A power model (Table 3) provided the best fit to the Drugstores point data quadrat method analysis results (Fig. 10). The best fit curve on the scatter plot in Figure 11 indicates that at small quadrat sizes (< 5km), the t-statistic approaches zero (random distribution). However, for all quadrat sizes > 5 km, the t-statistic, is greater than > 1.96 and has a clustered point pattern. At high quadrat sizes, the t-statistic appears to 'peak' at ~ 80 km and then decrease as quadrat size increases to 100 km.

y = 1.45 * x**1.50

DRUGSTORES SCATTERPLOT

Figure 11

NN

HOSPITALS POINT DATA

Figure12

CURVE ESTIMATION RESULTS

Table 3

HOSPITALS SCATTERPLOT

Figure 13

A cubic model (Table 4) provided the best fit to the Hospitals point data quadrat method analysis results (Fig. 12). The best fit curve is the most anomalous of the five datasets with a quasi-sinusoidal shape to the cubic model (Fig. 13). The curve represents a best-fit estimate that attempts to account for the significant t-statistic decrease at the 100 km quadrat size. Similar, to the drugstore point data set, the t-statistic value appears to have peaked at 80 km, before declining to lower values. The sinusoidal shape to this best-fit curve is in consequence to these two data points. Additional data at large quadrat sizes (> 80 km and < 100 km) would help to resolve the true shape of the best fit curve. At quadrat sizes, < 60 km, the hospital point pattern data has a random distribution. At quadrat sizes > 60 km, the distribution is clustered.

DRUGSTORES

HOSPITALS

y =3+-0.64*x++0.02*x*x+-9.87E-5*x*x*x

Quadrat analysis involves sampling of input points based on a GIS-generated overlay (fishnet) that subdivides a study area into polygons of equal size (quadrats; Mitchell, 2009). The number of points per quadrat and the frequency of counts are both calculated and from these values the variance of the input points and ultimately the t-statistic can be determined. When a distribution is random (Poisson Distribution), the mean and the variance of the points are equal (Dramowicz, 2005). When the point pattern is clustered, the variance is greater than the mean (Dramowicz, 2005) and when a distribution is regular (uniform), /the variance is smaller than the mean. The t-statistic provides a way to the spatial arrangement of test point patterns. At the 5% significance level, if the t-statistic is > 1.96, the point pattern is classified as clustered. If the t-statistic is < -1.96, the pattern is regular and for all values lying between these end-member values, the point pattern is deemed random.

Quadrat analysis study area into polygons of equal size quadrats

clustered regularrandom

involves sampling of input points based on a GIS-generated overlay ( ) that subdivides a ( ; Mitchell, 2009). The number of points per quadrat and the frequency of counts are both calculated and from these values the of the input points and ultimately the can be determined. When a distribution is random ( ), the mean and the variance of the points are equal (Dramowicz, 2005). When the point pattern is clustered, the variance is greater than the mean (Dramowicz, 2005) and when a distribution is regular (uniform), /the variance is smaller than the mean. The t-statistic provides a way to the spatial arrangement of test point patterns. At the 5% significance level, if the t-statistic is > 1.96, the point pattern is classified as . If the t-statistic is < -1.96, the pattern is and for all values lying between these end-member values, the point pattern is deemed .

fishnetPoisson Distributionvariance t-statistic

This section presents a series of results for 5 different census point datasets ( ). The data were mapped and re-projected to and quadrat method analysis was carried for five discrete quadrat sizes (5 km, optimal, 60 km, 80 km and 100 km). The results were tabulated and used to examine point pattern behaviour in graphical format. SPSS statistics was used to generate scatterplot of the resultant t-statistic vs. quadrat size. A best – fit curve was finally plotted through the points to how . Results are presented for each point dataset below.

banks, dentists, drugstores, hospitals, physicians NAD83 UTM zone 20

point patterns changes with increasing quadrat sizevisualize

A ( ) provided the best fit to the Banks point data quadrat method analysis results ( ). The best fit curve on the scatter plot in indicates that at small quadrat sizes (< 5km), the t-statistic approaches zero and will have a random distribution. However, for all quadrat sizes > 5 km, the t-statistic is greater than > 1.96 and thus has a . The shape of the cubic curve indicates that for the range of quadrat sizes between the optimal size and ~ 80 km, the t-statistic increases at a greater rate (i.e. steeper curve gradient), than for quadrat sizes > 80 km, where the gradient of the curve flattens out (but remains positive).

cubic model Table 1Fig. 6 Figure 7

clustered pattern

A cubic model ( ) provided the best fit to the Dentists point data quadrat method analysis results ( ). The best fit curve on the scatter plot in indicates that at small quadrat sizes (< 5km), the t-statistic approaches zero (random distribution). However, for all quadrat sizes > 5 km, the t-statistic is greater than > 1.96 and, similar to the banks dataset, has a

.

Table 2Fig. 8

Figure 9

clustered point pattern

A ( ) provided the best fit to the Drugstores point data quadrat method analysis results ( ). The best fit curve on the scatter plot in indicates that at small quadrat sizes (< 5km), the t-statistic approaches zero (random distribution). However, for all quadrat sizes > 5 km, the t-statistic, is greater than > 1.96 and has a At high quadrat sizes, the t-statistic appears to 'peak' at ~ 80 km and then decrease as quadrat size increases to 100 km.

power model Table 3Fig. 10

Figure 11

clustered point pattern.

A ( ) provided the best fit to the Hospitals point data quadrat method analysis results ( ). The best fit curve is the most anomalous of the five datasets with a quasi-sinusoidal shape to the cubic model ( ). The curve represents a best-fit estimate that attempts to account for the significant t-statistic decrease at the 100 km quadrat size. Similar, to the drugstore point data set, the t-statistic value appears to have peaked at 80 km, before declining to lower values. The to this best-fit curve is in consequence to these two data points. Additional data at large quadrat sizes (> 80 km and < 100 km) would help to resolve the true shape of the best fit curve. At quadrat sizes, < 60 km, the hospital point pattern data has a random distribution. At quadrat sizes > 60 km, the distribution is

cubic model

sinusoidal shape

Table 4Fig. 12

Fig. 13

clustered.

BANKS

DENTISTS

DRUGSTORES

HOSPITALS

CUBIC

POWER

CUBIC

CUBIC

INTRODUCTIONINTRODUCTION

RANDOM REGULAR CLUSTERED

source: http://gispopsci.org/wp-content/uploads/2013/02/RandUnifClust.png

Figure 1: Point pattern distribution types

RANDOM – any point is equally as likely to occur at any location; the position of any point is not controlled by the position of other points

UNIFORM – Every point is a maximum distance from neighbouring points

CLUSTERED – Several points are concentrated spatially; large areas are devoid of data

Point pattern analysis refers to the evaluation of the spatial arrangement of point datasets, typically in two dimensions. The purpose of the analysis method is to determine (i) if there is a tendency in the dataset to exhibit a systematic pattern over an area (as opposed to a random spatial arrangements) and (ii) over what scale does the pattern manifest. Point spatial distribution can be divided into three groups *1(Fig. 1):

In this study, QUADRAT ANALYSIS and AVERAGE NEAREST NEIGHBOUR point pattern analysis methods are used to characterize how the density of a point pattern varies for five point datasets in the province of Nova Scotia. At its' simplest, point pattern analysis allows for a comparison of the similarity of a dataset with a theoretical randomly distributed dataset of the same number of points and the same areal extent (Dramowicz, 2005). The purpose of this study is to:

1. Use the Quadrat Method script to conduct point pattern analysis on a series of 5 datasets. A core component of the analysis was the creation of a custom point pattern analysis tool that references a python script and utilizes the ArcPy package inbuilt within the ArcGIS environment. The python script (Quadrat Method) is housed custom toolbox (Point Pattern Analysis).

2. Apply the operational tool to analyse the spatial distribution of points for 5 project datasets provided (banks, dentists, drugstores, hospitals and physicians). Several quadrat sizes are compared graphically and a best fit curve is defined illustrating the variation in the t statistic with increasing quadrat size.

3. Use the Average Nearest Neighbour tool to analyse the spatial distribution of the same datasets used for quadrat analysis and carry out a brief comparison of the two point pattern analysis methods.

The quadrat method python script was written in PythonWin32 software which was dynamically linked to the ArcGIS 10.2.2 interface. Subsequent graphical analysis of the quadrat method results was completed in SPSS statistics version 22. All data census data were transformed from the WGS84 datum to the NAD83 datum and re-projected to UTM zone 20.

RANDOM – any point is equally as likely to occur at any location; the position of any point is not controlled by the position of other points

UNIFORM – Every point is a maximum distance from neighbouring points

CLUSTERED – Several points are concentrated spatially; large areas are devoid of data

refers to the evaluation of the of point datasets, typically in two dimensions. The purpose of the analysis method is to determine (i) if there is a tendency in the dataset to exhibit a (as opposed to a random spatial arrangements) and (ii) does the pattern manifest. Point spatial distribution can be divided into three groups *1( ):

Point pattern analysissystematic pattern over an area

Fig. 1

spatial arrangement

over what scale

In this study, and point pattern analysis methods are used to characterize how the density of a point pattern varies for five point datasets in the province of Nova Scotia. At its' simplest, point pattern analysis allows for a of a dataset with a theoretical of the same number of points and the same areal extent (Dramowicz, 2005). The purpose of this study is to:

QUADRAT ANALYSIS AVERAGE NEAREST NEIGHBOUR

comparison of the similarity randomly distributed dataset

1. Use the to conduct point pattern analysis on a series of . A core component of the analysis was the creation of a t and utilizes the ArcPy package inbuilt within the ArcGIS environment. The python script (Quadrat Method) is housed custom toolbox (Point Pattern Analysis).

Quadrat Method scriptcustom point pattern analysis tool that references a python scrip

5 datasets

2. to analyse the spatial distribution of points for 5 project datasets provided (banks, dentists, drugstores, hospitals and physicians). Several quadrat sizes are compared graphically and a is defined illustrating the variation in the t statistic with increasing quadrat size.

Apply the operational toolbest fit curve

3. Use the tool to analyse the spatial distribution of the same datasets used for quadrat analysis and carry out a brief comparison of the two point pattern analysis methods.

Average Nearest Neighbour

The quadrat method python script was written in software which was dynamically linked to the ArcGIS 10.2.2 interface. Subsequent graphical analysis of the quadrat method results was completed in SPSS statistics version 22. All data census data were transformed from the WGS84 datum to the .

PythonWin32

NAD83 datum and re-projected to UTM zone 20

NOTE - The Nova Scotia polygon shape file used in the Point Pattern Analysis INCLUDE islands off mainland Nova Scotia (e.g. Sable Island)NOTE - The Nova Scotia polygon shape file used in the Point Pattern Analysis INCLUDE islands off mainland Nova Scotia (e.g. Sable Island)