Method comparison on graph based models for
species occurences prediction
- Methods And First results -Jörn
VorwaldBTU
Cottbus
Overview1. Motivation
2. Basics
a) Graph Theory
b) Model Classification
3. Methods
a) Field Ecology
b) GIS
c) Statistics
4. Results
5. Outlook
Overview – Motivation - Basics – Methods – Results - Outlook
##
#
#
#
# #
##
#
#
#
#
# #
#
#
#
#
##
#
#
# #
##
#
# #
# #
# #
##
#
#
##
# # #
#
#
##
##
# #
#
#
#
#
#
#
## #
#
#
#
dl_ga dl_kr dl_nn dl_vo dl_fo ga_kr ga_nn ga_vo ga_fo kr_nn kr_vo kr_fo nn_vo nn_fo vo_fo
0.0
0.2
0.4
0.6
0.8
1.0
)1(3)1(
12
1
2
nnn
Hk
i i
i
nT
Motivation• Atlas project for grasshoppers and bush crickets in
Brandenburg, started 1996
• First 63 sampling sites in 1997, 61 in SPN and Cottbus, 2 outside; damselflies and dragonflies added for investigation
• Next 60 sites in 1998, all in SPN and Cottbus
• Completion in 1999
• Last 35 sites in 2000 in SPN with target of local aggregation
• In 1999 first idea beyond atlases: information theory based approach answering the question ‚How much information is enough, when you cannot get complete information?‘
• In 2003 second idea: compare graph based models for prediction of species occurences
Overview – Motivation - Basics – Methods – Results - Outlook
Overview – Motivation - Basics – Methods – Results - Outlook
From Atlas Project To Modelling
!!
!
!
!
! !
! !
!
!
!
!
! !
!!
!!
!!
!!
! !
! !
!
! !
! !
! !
! !
!!
!!
! ! !
!!
!!
!!
! !
!!
!
!
!!
! ! ! !
!
!
!
!
!
!!
! !
! !
!
!
!
!
!!
!! !
!
! !
! !
!
!!
!
! !! !
!
!
!!
!
!
! !
! ! !
! !
!!
!
!
!!
!
!
!
!
! ! !
!
! !
! !!! !
!
!
!
!!
!! !!!
!
!
!!
!
!
!!
!! !
!
!!
!
!!
!
!
!
!
!
!
!!
! !
! !
!
!
!
!
!!
!! !
!
! !
! !
!
!!
!
! !! !
!
!
!!
!
!
! !
! ! !
! !
!!
!
!
!!
!
!
!
!
! ! !
!
! !
!!
!
!
!
! !
! !
!
!
!
!
! !
!!
!!
!!
!!
! !
! !
!
! !
! !
! !
! !
!!
!!
! ! !
!!
!!
!!
! !
!!
!
!
!!
! ! ! !
!
!
! !!! !
!
!
!
!!
!! !!!
!
!
!!
!
!
!!
!! !
!
!!
!
!!
!
!
!
!
!
!!
!
!
!
! !
!
!
! !
!
!
!
!
!
!
!
!
!!
!
!
!!
!
!!
!
! !
!!
!
!
!
! !
!
!
! ! ! !
! !!
!
! !
!
!
!
! !
! !!
!!
!
!
!!
!
!!
!
!
!
!
!
! !
!
!
!
!
!
! !
!
! !
! !
!
!
!
! !
!
!
! !!
!
! ! !!
! !
!
• Brandenburg• 299 TK-25
• SPN/CB• 27 TK-25• 22 selected• contain 88
TK-10• 65 selected,
158 sites• 50 for
buffering, 106 sites
Basics – Graph Theory
• What is a graph?
• What are special graphs?
• What is adjacency in graphs?
• What are weighted edges?
• What kinds of graphs are common in ecological modelling?
• What kinds of graphs are used in my approach?Overview – Motivation - Basics – Methods – Results - Outlook
Graph Theory - Graphs
• A graph is a system of points and the points connecting lines (Bodendiek & Lang 1995).
• A graph is a system of point sets and of sets of point connecting lines. The set of lines may be empty. Usually the points are named vertices, and the lines are named edges.
Overview – Motivation - Basics – Methods – Results - Outlook
Graph Theory – Special Graphs
• Complete graphs
• Wheels
• Stars
• Cycles
• Trees
• Platonian graphs
• Petersen graph
Overview – Motivation - Basics – Methods – Results - Outlook
Graph Theory - Adjacency• When an edge connects two vertices, the vertices are
called ‚incident‘ to the edge, or, the edge is incident to each vertex.
v1
v2 v3
v4
x1
x2
x3
z1
z3
z2
z4
z5
e7e8
e6e5
e4
e3e2
e1
Overview – Motivation - Basics – Methods – Results - Outlook
Graph Theory – Edge-weighting
• Each edge can be weighted by adding a special attribute.
• Some important problems of graph theory and computer sciences are related to weighted graphs (e. g. optimisation problems, travelling salesman problem).
z1
z3
z2
z4
z5 e7e8
e6e5
e4
e3e2
e1 e1 = 6 e5 = 5e2 = 4 e6 = 5e3 = 8 e7 = 6e4 = 3 e8 = 7
Σe = 44
Overview – Motivation - Basics – Methods – Results - Outlook
Graphs In Ecological Modelling
• Graph based models are rare in ecology.
• Two kinds of graphs found in literature review
• Voronoi (Dirichlet, Thiessen) tessellation (e. g. Byers 1992, Mercier & Baujard 1997, Okabe et al. 2000)
• Gabriel graph (Gabriel & Sokal, 1969)
• In graphs usually ‚only‘ adjacency can be used for modelling.
• Polygon methods can introduce more realistic assumptions about abiotic and biotic factors influencing sampling sites or target organisms.
Overview – Motivation - Basics – Methods – Results - Outlook
Graphs In This Approach
• Delaunay triangulation (dl)
• Gabriel graph (ga)
• Minimum spanning tree by Kruskal algorithm (kr)
• Nearest neighbours (nn)
• Voronoi tessellation (vo)
Overview – Motivation - Basics – Methods – Results - Outlook
Delaunay Triangulation• In a delaunay triangulation a system of 3 vertices and 3
edges building triangles is establisht to separate the complete surface of interest, i.e. the area between vertices.
• Algorithm (‚divide and conquer‘):
• A triangle of edges is drawn between three points.
• The Delaunay constraint is checked:
• No fourth point is within the circumcircle of the triangle.
• (additional: The sum of two angles is greater than 30°.)
• A second triangle is drawn.
• The Delaunay constraint is checked again …Overview – Motivation - Basics – Methods – Results - Outlook
Overview – Motivation - Basics – Methods – Results - Outlook
Gabriel Graph• A Gabriel graph is constructed similarly to a Delaunay
triangulation.• In praxis, edges may be rejected from the graph due to
external conditions.• Algorithm:
• Draw an edge between two points with minimal distance (nearest neighbours).
• Check the constraint: a third point must not be within a circle with the edge as diameter.
• Draw an edge between one of the first points and a third point.
• Check the constraint again. When the new edge violates the constraint, the edge is to reject as member of the graph.
Overview – Motivation - Basics – Methods – Results - Outlook
Overview – Motivation - Basics – Methods – Results - Outlook
Minimum Spanning Tree• A minimum spanning tree is a set of connected vertices,
where the sum of the lengths of all edges tends to be less then other sums. It is a tree containing all vertices.
• Algorithm (Kruskal):
• Choose an edge with minimal distance (nearest neighbours). When more than one exist, choose accidently one.
• Choose a second edge with minimal or next bigger distance.
• Choose a third edge under same condition.
• Check the constraint: The edges must not build a cycle. If they do, reject the last choosen edge.
• Choose a new edge. Check the constraint again.Overview – Motivation - Basics – Methods – Results - Outlook
Overview – Motivation - Basics – Methods – Results - Outlook
Nearest Neighbours• A nearest neighbour graph is necessarily a set of
disconnected subgraphs, where each vertex has a connection to the vertex with minimum distance. (Nevertheless, a vertex may get a connection to two vertices.)
• Algorithm:
• Calculate the distances within a complete graph. Order the distances ascending.
• Start with minimum distance and draw an edge.
• Check the constraint: All vertices must be included.
• Continue with the next bigger distance, draw a new edge.
• Check the constraint again.Overview – Motivation - Basics – Methods – Results - Outlook
Overview – Motivation - Basics – Methods – Results - Outlook
Voronoi Diagram• A Voronoi diagram is the dual graph of a Delaunay
triangulation, i.e. each edge within a Voronoi diagram is orthograpic to an edge within the Delaunay triangulation.
• Within a Voronoi cell each point is affected nearer to the centre of the cell than to each other cell centre.
• Algorithm:
• Select two points (e. g. the most top and left and its nearest neighbour), draw temporarily a line between them.
• Draw an edge on the line in the middle orthographic to it, remove the line.
• Select a third point, draw temporarily lines between it and all neighbours. Create edges orthographic to each of the lines. Cut the edges on intersection points. Remove the lines.
Overview – Motivation - Basics – Methods – Results - Outlook
Overview – Motivation - Basics – Methods – Results - Outlook
Model Classification
• Multidimensional vector of classes
• Rare classifications in literature review:
• Levins (1966), Sharpe (1990), Refsgaard (1996), eWater Ltd. (2006)
• Rare reflection of classifications
• No explicit classification for each model possible
Overview – Motivation - Basics – Methods – Results - Outlook
Model Classifications
• By type
• mechanistic
• statistical
• By time complexity
• static
• dynamic
• By species complexity
• single species
• multiple species
Overview – Motivation - Basics – Methods – Results - Outlook
• By data distribution
• localised
• gridded
• By purpose
• screening
• research
• planning, monitoring, assessment
Model Classifications
• By extent
• local (x=1)
• regional (x=2)
• continental (x=3)
• By number
• presence only (y=1)
• presence and absence (y=2)
• activity, abundance (y=3)
Overview – Motivation - Basics – Methods – Results - Outlook
• By background
• empirical (z=1)
• causal (z=2)
Model Classification
Overview – Motivation - Basics – Methods – Results - Outlook
extent
num
ber
backgroundByers (1992)
Boyce (2003)
• Byers (1992)
• statistic
• static
• 3 bark beetle species, used as single
• extent: bork of single tree
• presence only
• localised data
• for research
• causal
• Boyce et al. (2003)
• statistic: log. regress.
• dynamic: summer/winter
• single species: elk
• extent:Yellowstone National Park
• relative abundance
• localised data
• for monitoring
• empirical
• Buckland & Elston (1993)
• statistic: GLM
• static
• single species: green woodpecker, red deer
• extent: north-east Scotland
• relative abundance
• gridded data
• for screening
• causal
B & E (1993)• Ferrier et al. (2002)
• statistic
• static
• community level
• extent: North East New South Wales
• presence/absence
• gridded data
• for monitoring
• causal
Ferrier (2002)
• Vorwald (2006)
• statistic
• static
• community level
• extent: CB/SPN
• relative activity
• localised data
• for screening
• empirical
Vorwald (2006)
Methods – Field Ecology• Selection of sampling sites
• First site set: one site in each topographic map 1:10,000 within SPN or CB
• Second set: same procedure
• Third set: unobserved topographic map (1:10,000) squares within 4 selected topographic maps 1:25,000 with one site each
• Criteria:
• Preferably grassland with gradient in wetness
• Preferably open water (creek, river, pond or lake)
• Preferably old trees on or near site
Overview – Motivation - Basics – Methods – Results - Outlook
Methods – Field Ecology• Observation
• Visual observation (grasshoppers, bush crickets, damselflies and dragonflies)
• Net capturing (all groups) – specimen collection
• Acoustic observation (grasshoppers and bush crickets)
• By ear
• With bat detector support
• Documentation
• Field forms
• DatabaseOverview – Motivation - Basics – Methods – Results - Outlook
Methods – GIS• Preparation:
• Sets of sampling and buffer sites exported to plain text files from the database
• Calculation of graphs within adopted Java program
• Export of results to plain text files
• Import of text file information into GIS for visualisation and preparation of intersection
• Intersection of Voronoi diagrams in GIS, export of relevant information of intersected polygons to plain text files
• Calculation of species vectors in database
Overview – Motivation - Basics – Methods – Results - Outlook
KNOWN BUF_DS SHAPE PREDICT
97 05 97_05 98
97 06 97_06 98
97 05_06 97_05_06 98
98 05 98_05 97
98 06 98_06 97
98 05_06 98_05_06 97
97_1 97_2_05_1 97_1_97_2_05_1 00
97_1 97_2_06_1 97_1_97_2_06_1 00
KNOWN BUF_DS SHAPE PREDICT
97 05 97_05 98
97 06 97_06 98
97 05_06 97_05_06 98
98 05 98_05 97
98 06 98_06 97
98 05_06 98_05_06 97
97_1 97_2_05_1 97_1_97_2_05_1 00
97_1 97_2_06_1 97_1_97_2_06_1 00
GIS - Preparation
ID SHORT START SUBSET X_COORD Y_COORD
1 Jessern 1997 4651304,13163 5768510,75401
2 Groß Drewitz 1997 4679614,60514 5767082,00114
3 TÜP Lieberose 1997 4660128,00351 5757345,31492
4 Staakow 1997 4665327,07645 5764012,82831
12 Weidenweg 1997 2 4646409,33013 5749301,96544
13 Paulicks Mühle 1997 2 4646105,05869 5743732,47509
14 Byhleguhre 1997 2 4650020,89988 5750267,69655
29 Dahlitz 1997 1 4653737,66796 5739262,97754
30 Zahsow 1997 1 4656727,79354 5739109,94034
31 Koselmühle 1997 1 4651528,22304 5733255,17356
NodeId NodeX NodeY
1 4651304.13 5768510.75
2 4679614.61 5767082
3 4660128 5757345.31
4 4665327.08 5764012.83
12 4646409.33 5749301.97
13 4646105.06 5743732.48
14 4650020.9 5750267.7
29 4653737.67 5739262.98
30 4656727.79 5739109.94
31 4651528.22 5733255.17
32 4658004.59 5733498.77
Overview – Motivation - Basics – Methods – Results - Outlook
NodeId NodeX NodeY
1 4651304.13 5768510.75
2 4679614.61 5767082
3 4660128 5757345.31
4 4665327.08 5764012.83
12 4646409.33 5749301.97
13 4646105.06 5743732.48
14 4650020.9 5750267.7
29 4653737.67 5739262.98
30 4656727.79 5739109.94
31 4651528.22 5733255.17
32 4658004.59 5733498.77
ID SHORT START SUBSET X_COORD Y_COORD
1 Jessern 1997 4651304,13163 5768510,75401
2 Groß Drewitz 1997 4679614,60514 5767082,00114
3 TÜP Lieberose 1997 4660128,00351 5757345,31492
4 Staakow 1997 4665327,07645 5764012,82831
12 Weidenweg 1997 2 4646409,33013 5749301,96544
13 Paulicks Mühle 1997 2 4646105,05869 5743732,47509
14 Byhleguhre 1997 2 4650020,89988 5750267,69655
29 Dahlitz 1997 1 4653737,66796 5739262,97754
30 Zahsow 1997 1 4656727,79354 5739109,94034
31 Koselmühle 1997 1 4651528,22304 5733255,17356
0 4632582.890005745551.93000
0 4636575.000005741010.00000
1 4638150.000005755400.00000
1 4632582.890005745551.93000
2 4638155.000005733600.00000
2 4636575.000005741010.00000
3 4638735.000005734745.00000
3 4638155.000005733600.00000
4 4638735.000005734745.00000
4 4636575.000005741010.00000
5 4639150.480005730300.92000
5 4638155.000005733600.00000
180 186 6047.0186 180 6047.0173 180 11313.0180 173 11313.0187 186 7577.0186 187 7577.0188 187 1284.0187 188 1284.0188 186 6627.0186 188 6627.0192 187 3446.0187 192 3446.0192 188 4463.0188 192 4463.0193 192 6464.0192 193 6464.0182 186 7531.0186 182 7531.0
Overview – Motivation - Basics – Methods – Results - Outlook
GIS - Preparation• 76 point sets for input (one file each):
• 62 for all graph types (26 with ‚known‘ and ‚buffer‘ points, 36 with ‚known‘, ‚buffer‘ and points to ‚predict‘
• 14 for graph types except Voronoi graphs (6 with ‚known‘ points (without ‚buffer‘), 8 with known and points to ‚predict‘)
• 670 files as output:
• 62 * 5 (graph types) for all lines
• 62 * 4 for all neighbouring points for all types except Voronoi
• 14 * 4 for lines of graph types except Voronoi
• 14 * 4 for neighbouring points of graph types except VoronoiOverview – Motivation - Basics – Methods – Results - Outlook
GIS - Intersection• Each Voronoi diagram with ‚known‘ sites and ‚buffer‘ sites
has to be intersected with corresponding Voronoi diagram with sites to ‚predict‘ added.
• Buffering for avoidance of ‚edge effects‘ (Kenkel et al. 1989)
• 36 intersections at all
• Split into small polygons with two parents:
• One from known or buffering site
• One from site to be predicted
• Calculation of areas and relation of the area to the area of the parent for each polygon
Overview – Motivation - Basics – Methods – Results - Outlook
Overview – Motivation - Basics – Methods – Results - Outlook
GIS - Intersection
Overview – Motivation - Basics – Methods – Results - Outlook
1 4649437.156855770571.553371 4648965.431595765840.895441 4649350.000005765310.163101 4653966.569335767238.157051 4654150.526015767459.054311 4653141.974355771837.518571 4653062.682575771885.732932 4675843.706645767525.422102 4677881.383775764961.497652 4681057.635445765466.766082 4678293.735285771173.281562 4677786.850265771184.857242 4677461.308165770960.247083 4655085.498445757481.473633 4660252.599485756432.552243 4662308.694615760767.060353 4661109.535085761496.041013 4658889.969325761673.507193 4655171.682885759479.60279
113 211 211 100
114 175 175 100
115 176 176 100
116 177 177 100
117 64 2 74.4
118 64 10 3.57
119 64 167 3.63
120 64 169 1.48
121 64 170 16.92
122 65 3 75.11
123 65 15 19.25
124 65 18 3.65
125 65 177 1.98
126 66 3 1.96
127 66 4 77.17
128 66 164 19.56
129 66 165 1.32
Calculation Of Species Vectors
• A vector in this approach is a space of attributes.
• Relevant attributes are ‚counts‘ of species sampled on the sites.
• A species count is the maximum detection class, in which the species have been observed.
Overview – Motivation - Basics – Methods – Results - Outlook
Class name Number of observations Class centre (2^(n-1))
1 1 1
2 2 … 4 2
4 5 … 9 8
5 10 … 19 16
6 20 … 49 32
7 ≥ 50 64
Calculation Of Species Vectors
• Tables in database (filled by Visual Basic programs):
• Neighbouring sample sites from Java output (incl. distances)
• Voronoi cell intersection from GIS output (incl. areas)
• Prediction table with sample sites and prediction subsets (incl. ‚found‘ as observed values) as rows, species as columns and species counts as table values
• Filling prediction table
Overview – Motivation - Basics – Methods – Results - Outlook
Calculation Of Species Vectors
• Filling prediction table
• For each site to be predicted iteration on neighbours defined by graph type
• Sum of all distances for calculation of relation of each neighbour
• Calculation of prediction relation using ‚real‘ number (i.e. converted class centre)
• Sum of all relations reconverted to class
• Similar procedure for Voronoi cells using areas instead of distances
Overview – Motivation - Basics – Methods – Results - Outlook
Calculation Example• Site 77 within gabriel
graph with known sites ‚97‘ and buffer set ‚05‘
• Neighbours: 14, 15, 16 ,17
Overview – Motivation - Basics – Methods – Results - Outlook
77 14 4132.0
77 15 4413.0
77 16 3500.0
77 17 3642.0
• Σdist = 15,68777 14 0.24
77 15 0.22
77 16 0.27
77 17 0.27
• Vector calculation
Vector Calculation Example
Overview – Motivation - Basics – Methods – Results - Outlook
77 14 4132.0
77 15 4413.0
77 16 3500.0
77 17 3642.0
77 14 0.24
77 15 0.22
77 16 0.27
77 17 0.27
• Vectors for observation
• o14 <- c(0,0,6,5,6,2,7,0,4,7,6,4, ... ,2,0,0,0,2,2,0,2)
• o15 <- c(0,0,4,6,6,5,4,0,0,7,6,2, ... ,0,0,0,6,4,4,0,0)
• o16 <- c(0,0,5,7,5,5,6,0,0,5,6,2, ... ,0,0,0,0,0,2,0,4)
• o17 <- c(0,0,4,5,5,4,5,0,0,7,6,1, ... ,2,0,0,6,4,1,0,1)
• Calculation of prediction using interim transformation to ‚abundances‘ and retransformation to observation classes
• p77 <- c(o14 * 0.24 + o15 * 0.22 + o16 * 0.27 + o17 * 0.27)
• p77 <- c(0,0,5,6,6,5,6,0,2,7,6,2, ... ,2,0,0,5,2,2,0,2)
• Calculations for sample sites, which are to be predicted within prediction subsets of sites, for each graph: 7,238
Methods - Statistics• Preparation:
• Export of values to be calculated in statistics environment to plain text files from database (prediction table)
• Export of statistics scripts from database
• Calculation of statistics in statistics environment R
• Export of results to plain text files
• Import of statistics results into database
• Visualisation of results in R, or in spreadsheet calculation program
Overview – Motivation - Basics – Methods – Results - Outlook
Statistics Preparation• Export of values to be calculated in statistics environment
to plain text files from database (prediction table)
• p77 <- c(0,0,5,6,6,5,6,0,2,7,6,2, ... ,2,0,0,5,2,2,0,2)
Overview – Motivation - Basics – Methods – Results - Outlook
• Export of statistics script from database: 1,506 tests
dl ga kr nn vo fo
0 0 0 0 0 0
0 0 0 0 0 0
5 5 5 5 5 4
6 6 6 7 6 6
6 6 6 5 6 6
5 5 4 5 4 5
6 6 6 6 6 0
0 0 0 0 0 4
2 2 2 0 2 0
7 7 6 5 7 6
6 6 6 6 6 6
2 2 2 2 2 2
…
sink( file = "U:/diss/r/kruskal_wallis/output/kruskal_result.txt", append = FALSE )
…
#site: 77
kktst <- read.table("U:/diss/r/kruskal_wallis/input/77-97_05_98.dat", header = TRUE)
site_77 <- c(kktst$dl,kktst$ga,kktst$kr,kktst$nn,kktst$vo,kktst$fo)
ps_97_05_98 <- factor(rep(1:6, c(86, 86, 86, 86, 86, 86)))
kruskal.test(site_77, ps_97_05_98)
…
sink( file = NULL )
Statistics Calculation In R• Kruskal-Wallis rank sum test for each site within
each prediction set: models vs. observation - 1,506 operations
• Correlation using R-method „kendall“, i.e. rank based measure of association, for each site within each prediction set: each model vs. each other (incl. observation) - 22,590 operations
• Group building by model comparison, e. g. all Delaunay triangulations vs. all Gabriel graphs, or all Voronoi tessellations vs. all observations: Kruskal-Wallis rank sum test for the comparison of correlation coefficients – 106 operations
Overview – Motivation - Basics – Methods – Results - Outlook
Data And Result Handling
Overview – Motivation - Basics – Methods – Results - Outlook
• Calculation of statistics in R
• Export of results to plain text files
• Import of statistics results into database by text wrapping routine in Visual Basic
• Visualisation of results in R, or in spreadsheet calculation program
…
Kruskal-Wallis rank sum test
data: site_77 and ps_97_05_98
Kruskal-Wallis chi-squared = 19.9643, df = 5, p-value = 0.001269
…
Overview – Motivation - Basics – Methods – Results - Outlook
Results
• Kruskal-Wallis rank sum test for each site within each prediction set: models vs. observation
• Correlation using Kendall‘s τ for each site within each prediction set: each model vs. each other (incl. observation)
• Kruskal-Wallis rank sum test for the correlation coefficients of model comparisons
• Advantages and limits of methods
Overview – Motivation - Basics – Methods – Results - Outlook
Models vs. Observations• Rows: site
• Columns: prediction set
• Cells: p-value of Kruskal-Wallis rank sum test (models vs. observation)
ID 97_05_98 97_05_06_9897_06_9864 0,3244 0,3614 0,333365 0,000544 0,000628 0,0004966 0,9658 0,997 0,999267 1,04E-13 5,76E-13 5,76E-1368 0,01031 0,01031 0,0103169 0,000687 0,001286 0,0191270 0,000157 0,000157 0,00020271 7,17E-06 7,17E-06 7,17E-0672 0,04321 0,04321 0,0432173 0,07101 0,07337 0,0733774 0,001006 0,001813 0,00181375 0,6383 0,6383 0,638376 4,2E-05 4,2E-05 4,2E-0577 0,001269 0,001616 0,00161678 0,004537 0,004537 0,00453779 0,001985 0,001985 0,00198580 0,000532 0,000532 0,000532
• Significance level less depending on prediction set
• Heavy differences between groups of sites
•Low corre
lation with
out statis
tical
significance
Overview – Motivation - Basics – Methods – Results - Outlook
• Pattern to be recognised
• Not independent from prediction set
• Differences between groups of sites
Models vs. ObservationsID 00_97_1 00_97_1_98_1 00_97_2_05_1_97_100_97_2_05_1_97_1_98_1
29 0,002261 0,0005179 0,0009002 0,000941230 0,1128 0,006092 0,1652 0,0122931 0,01191 0,0007397 0,01549 0,000761832 0,3468 0,002578 0,4438 0,00508533 0,1623 0,001004 0,2149 0,00188734 1,943E-08 8,225E-14 0,0006627 0,00104335 0,001144 0,1223 0,001701 0,107236 0,0000441 0,04833 0,000004346 0,057323742 0,00001053 0,01478 0,000007105 0,0117343 0,000003024 1,205E-10 0,000009334 5,344E-1044 0,998 0,004016 0,9995 0,065545 0,001063 0,001963 0,0007176 0,00308646 0,009834 0,01047 0,01081 0,012847 0,0004521 1,375E-07 0,001374 0,000199348 0,09602 0,5212 0,0000969 0,283649 0,6537 0,0006065 0,3804 0,0002888
•Low corre
lation with
out statis
tical
significance
Overview – Motivation - Basics – Methods – Results - Outlook
• Independent from prediction set
• Model comparison creates groups:
• Delaunay triangulations are similar to Gabriel graphs, and similar to Voronoi tessellations
• Minimum spanning trees are similar to nearest neighbour graphs
• Observations are less similar to each model than unsimilar models among each other
Model CorrelationsID PS dl_ga dl_kr dl_nn dl_vo … nn_fo vo_fo77 97_98 1 0,907748 0,619755 0,403974777 97_05_98 1 0,907748 0,619755 0,993317 0,4039747 0,57837677 97_05_06_98 1 0,900415 0,626944 0,991985 0,4039747 0,58225177 97_06_98 1 0,900415 0,626944 0,991985 0,4039747 0,582251
dl_ga dl_kr dl_nn dl_vo dl_fo ga_kr ga_nn ga_vo ga_fo kr_nn kr_vo kr_fo nn_vo nn_fo vo_fo
0.0
0.2
0.4
0.6
0.8
1.0
Overview – Motivation - Basics – Methods – Results - Outlook
Final Kruskal-Wallis test
• Delaunay/Gabriel and MST/NN are very similar.• MST and NN are different from Delaunay as well as from
Gabriel.• Observations are different from all other models.
Overview – Motivation - Basics – Methods – Results - Outlook
Advantages And Limits• The models are easy to implement.
• The models are easy to understand.
• The models are easy to extend.• The less the graph is connected, i.e. the less the set of
edges is, or, the less the number of neighbours of a single vertex is, the less is the probability of connections between known sites and sites to be predicted: The decrease of edge number increases the error rate.
• The border effects are important limitations, not only for Voronoi cells (comp. Byers 1992), but for all graph types: spatially outlying sites are not or only bad to be predicted.
Limits: Graph Connections
!
! !
!
!
!
!
!
! !
!
2
67
171168
10
169
178
166
170
167
8 1799
4
165
11
!
! !
!
!
!
!
!
! !
!
2
67
171168
10
169
178
166
170
167
8 1799
4
165
11
64
!
! !
!
!
!
!
!
! !
!
2
67
171168
10
169
178
166
170
167
8 1799
4
165
11
64
116 177 177 100
117 64 2 74.4
118 64 10 3.57
119 64 167 3.63
120 64 169 1.48
121 64 170 16.92
122 65 3 75.11
123 65 15 19.25
Overview – Motivation - Basics – Methods – Results - Outlook
-> 22.03% from unknown (buffering) sites
Limits: Graph Connections
!
!
!
!
!
!
!
! !
!
!
!
85
22
11
23
9
184
25 185
6
24
19 183
179
4
21
18
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
! !
!
!
!
85
22
11
23
9
184
25 185
6
24
19 183
179
4
21
18
86
208 85 22 52.99
209 86 8 1.42
210 86 9 0.4
211 86 11 4.12
212 86 22 5.92
213 86 23 81.98
214 86 25 6.16
215 87 22 45.13
Overview – Motivation - Basics – Methods – Results - Outlook
-> 0.0% from unknown (buffering) sites
Limits: Graph Connections
!
!
!
!
!
!
! !!172
159
163
164
160
175
162161
176
165
66
174173 177
!
!
!
!
!
!
!
! !!172
159
163
164
160
175
162161
176
165
66
174173 177
1!
!
!
!
!
!
!
! !!172
159
163
164
160
175
162161
176
165
66
174173 177
1
113 211 211 100
114 1 160 31.73
115 1 161 16.48
116 1 162 0.01
117 1 163 22.59
118 1 172 0.34
119 1 175 28.74
120 1 176 0.12
121 2 64 75.47
Overview – Motivation - Basics – Methods – Results - Outlook
-> 100.0% from unknown (buffering) sites
Limits: Site Introduction
Overview – Motivation - Basics – Methods – Results - Outlook
• difference between introduction of sites to be predicted one after other, and simultanous introduction of many sites
• order of introduction important due to local reorganisation of graph
• e.g. leaving out site 81:
• sites 18, 20, 78, and 80 would loose one neighbour (81)
• all would get a new neighbour (18: 20, 20: 18, 78: 80, 80: 78)
!!
!
!
!
! !
!!
!
!
!
!
! !
!
!
!
!
!
!
!
!
! !
!!
!
! !
! !
!!
!!
!
!
!!
! ! !
!
!
!!
!
!
! !
!
!
!
!
!
!
!!
!! !
!
!
!
!!
!
!
!
!!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!!
!
! !
!!
!
!
!
! !
!
!
! ! !!
! !!
!
! !
!
9876
5
4
3
21
63626160
595857
56
55
5453
52
5150
4948
4746
4544
434241
4039
3837
3635
3433
3231
3029
28
2726
2524
2322
2120
191817
16
1514
13
12
11
10
177
176175
211210209208
207206205204
203
202
201200
199
198
197
196195
194193
192
191190
189
188187
186
185184
183182
181
180
179
178
174
173
172
171
169
168
167166
165
164
163
162161
160159
!
!
!
!
!
! !
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
! !
!
!
!
! !
!!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!!
!
!
!!
!
!
!
!
!
!
!!
! 3
33
302927
20
18
17
16
1514
177
!
!
!
!
!
! !
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
! !
!
!
!
! !
!!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
! !
! !
!
!
!
!
! !
!!
!
!
!
!
!
!
! !
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
3
33
302927
20
18
17
16
1514
177
95
92
89
8483
82
81
80
79
78
77
76
75
67
65
!
!
!
!
!
! !
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
! !
!
!
!
! !
!!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
! !
! !
!
!
!
!
! !
!!
!
!
!
!
!
!
! !
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
3
33
302927
20
18
17
16
1514
177
95
92
89
8483
82
81
80
79
78
77
76
75
67
65
xx
xx x
Overview – Motivation - Basics – Methods – Results - Outlook
More Limits
• Species richness is poor in all taxa investigated.
• The landscape heterogeneity of the study area is poor.
Outlook - Questions• What is the best model?
• Are there differences between buffered and unbuffered prediction sets?
• Are there differences between Orthoptera and Odonata, i.e. is the applicability of the models independent from the species group?
• Why no use of geostatistics?
• How are errors to be handled, which errors occur, which errors are how being propagated?
Overview – Motivation - Basics – Methods – Results - Outlook
What Is The Best Model?
Overview – Motivation - Basics – Methods – Results - Outlook
• The problem: the models seemed being better than ‚reality‘ (observation) -> no scale for assessment as in modelling literature
• First: Observation is just another model, which is being ignored in modelling literature.
• Second: Observation seemed being variation-less due to ‚single‘ observation acts.
• The solution: simulating other ‚observations‘ using the same models being tested.
Best Model - Example
Overview – Motivation - Basics – Methods – Results - Outlook
• The method is called ‚leave one out‘, i.e.
• take all sites but one
• ‚predict‘ its result
• take all sites but another one
• …
• All graph types have to be included getting not one ‚observation‘ but many: 5, i.e. it fits not for Voronoi diagrams.
!!
!
!
!
! !
!!
!
!
!
!
! !
!
!
!
!
!!
!
!
! !
!!
!
! !
! !
!!
!!
!
!
!!
! ! !
!
!
!!
!!
! !
!
!
!
!
!
!
!! !
!
!
!
!!
!
!
!
! !
!!
!
!
!
!
! !
!
!
!
!
!!
!
!
! !
!!
!
! !
! !
!!
!!
!
!
!!
! ! !
!
!
!!
!!
! !
!
!
!
!
!
!
!! !
!
!
!
dl ga kr nn fo
0 0 0 0 0
0 0 0 0 0
5 5 5 5 4
6 6 6 7 6
6 6 6 5 6
5 5 4 5 5
6 6 6 6 0
0 0 0 0 4
2 2 2 0 0
7 7 6 5 6
6 6 6 6 6
2 2 2 2 2
2 2 2 2 1
2 2 4 5 0
0 0 0 0 0
2 2 2 4 2
0 0 0 0 0
Best Model - Example
Overview – Motivation - Basics – Methods – Results - Outlook
!
!
!
!!
! !
! !
!
!
!
!
!
!
!
!!
!
! !
! !
!
!
!
!
! !! !
!
!
!
!
!
!
! !
! !!
! !
!!
!
!
!!
!
!
!
!
! !!
!
! !
99
9897
96
95
949392
91
90
8988
87
8685
8483
82
818079
7877
76
75
74
73
7271
7069
6867
66
65
64
123122
121
120119118
117
116
115
114
113112
111
110
109108
107106
105104103
102101
100
!
!
!
!!
! !
! !
!
!
!
!
!
!
!
!!
!
! !
! !
!
!
!
!
! !! !
!
!
!
!
!
!
! !
! !!
! !
!!
!
!
!!
!
!
!
!
! !!
!
! !
99
9897
96
95
949392
91
90
8988
87
8685
8483
82
818079
7877
76
75
74
73
7271
7069
6867
66
65
64
123122
121
120119118
117
116
115
114
113112
111
110
109108
107106
105104103
102101
100
dl ga kr nn fo fd fg fk fn
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
5 5 5 5 4 4 4 4 0
6 6 6 7 6 5 5 4 4
6 6 6 5 6 5 5 5 4
5 5 4 5 5 4 4 4 4
6 6 6 6 0 4 4 4 2
0 0 0 0 4 2 2 0 0
2 2 2 0 0 2 2 0 0
7 7 6 5 6 6 6 5 4
6 6 6 6 6 6 6 5 4
2 2 2 2 2 4 4 4 4
2 2 2 2 1 0 0 0 0
2 2 4 5 0 1 1 2 2
0 0 0 0 0 0 0 0 0
2 2 2 4 2 2 2 0 0
0 0 0 0 0 0 0 0 0
dl_ga dl_kr dl_nn dl_vo dl_fo ga_kr ga_nn ga_vo ga_fo kr_nn kr_vo kr_fo nn_vo nn_fo vo_fo
0.0
0.2
0.4
0.6
0.8
1.0
Overview – Motivation - Basics – Methods – Results - Outlook
dl_g
adl
_kr
dl_n
ndl
_vo
dl_f
odl
_fd
dl_f
gdl
_fk
dl_f
nga
_kr
ga_n
nga
_v
ga_f
oga
_fd
ga_f
gga
_fk
ga_f
nkr
_nn
kr_v
okr
_fo
kr_f
dkr
_fg
kr_f
kkr
_fn
nn_v
onn
_fo
nn_f
dnn
_fg
nn_f
knn
_fn
vo_f
ovo
_fd
vo_f
gvo
_fk
vo_f
nfo
_fd
fo_f
gfo
_fk
fo_f
nfd
_fg
fd_f
kfd
_fn
fg_f
kfg
_fn
fk_f
n
0.0
0.2
0.4
0.6
0.8
1.0
What Is The Best Model?
Overview – Motivation - Basics – Methods – Results - Outlook
• The solution: The best model is the Voronoi tessellation, followed by Delaunay triangulation and Gabriel graph.
• Voronoi tessellation focuses not only on distances to single (known) sites, but to complete ‚recruiting areas‘.
• It can be expanded in the context of ecology and landscape ecology by introducing landscape parameters (e. g. connectivity, habitat suitability, etc.).
• Delaunay triangulation is due to duality of models an equivalent of Voronoi tessellation.
• Gabriel graphs are not far from Delaunay triangulation, but it is not feasible that sites are excluded from influence only by another constraint.
Why No Geostatistics?
Overview – Motivation - Basics – Methods – Results - Outlook
• It has been developed for static processes (geology).
• Autocorrelation is one of the central concepts.
• There is no external validation of autocorrelation concept: It is depending on dispersion of data points, independent from scale.
Why Simple Models?
Overview – Motivation - Basics – Methods – Results - Outlook
• Why only simplified models with adjacency as only factor?
• We tend to observe the wrong parameters.
• Programmatic literature
• Why don’t we believe the models? (Aber 1997)
• Does vegetation suit our models? (Bio 2000)
Outlook
Overview – Motivation - Basics – Methods – Results - Outlook
• Decoursey 1992. Developing models with more detail: do more algorithms give more truth?
XX
Acknowledgements
U. BröringS. FlemmingH. VorwaldG. Wiegleb
References• Aber, J. D. 1997. Why don’t we believe the models? – Bull. Ecol. Soc. Am. 78: 232–233. • Bio, A. M. F. 2000. Does vegetation suit our models? Assessing species distribution in environmental space. Nederlandse Geografische Studies 265. Krug/Faculteit Ruimtelijke Wetenschappen, Universiteit Utrecht, Utrecht, The Netherlands. 206 pp. • Boyce, M. S., Mao, J. S., Merrill, E. H., Fortin, D., Turner, M. G., Fryxell, J. & Turchin, P. 2003. Scale and heterogeneity in habitat selection by elk in Yellowstone National Park. Ecoscience 10(4): 421-431.• Buckland, S. T. & Elston, D. A. 1993. Empirical models for the spatial distribution of wildlife. Journal of Applied Ecology 30: 478–95.
References• Byers, J. A. 1992. Dirichlet tessellation of bark beetle spatial attack points. Journal of Animal Ecology 61: 759-768. • Decoursey, D. G. 1992. Developing models with more detail: do more algorithms give more truth? Weed Technol. 6, 709–715. • eWater Ltd. 2006. Series on model choice. 1. General approaches to modelling and practical issues of model choice. http://www.toolkit.net.au/cgi-bin/WebObjects/toolkit.woa/wa/modelChoice (valid on 20.09.2006)• Ferrier, S. Drielsma, M., Manion, G. & Watson, G. 2002. Extended statistical approaches to modeling spatial pattern in biodiversity: the north-east New South Wales experience. II. Community-level modeling, 11, 2309-2338.
References• Gabriel, K. R. & Sokal, R. R. 1969. A new statistical approach to geographic variation analysis. – Syst. Ecol. 18: 259-270.• Kenkel N.C., Hoskins J.A. & Hoskins W.D. 1989. Edge effects in the use of area polygons to study competition. Ecology, 70 : 272-274.• Levins, R. 1966. The strategy of model building in population ecology. Am. Sci. 54: 421–431.• Mercier, F. & Baujard, O. 1997. Proceedings of GeoComputation ‘97 & SIRC ’97: 161 – 171.• Okabe, A., Boots, B., Sugihara, K. & Chiu, S. N. 2000. Spatial tessellations: concepts and applications of Voronoi diagrams. 2nd ed. John Wiley & Sons, Chichester, UK.
References
• Refsgaard, J. C. 1996. Terminology, modelling protocol and classification of hydrological model codes. In: Refsgaard, J. C. & Abbott M. B. (eds.) Distributed hydrological modelling, Kluwer: 17-40.• Sharpe, P. J. A. 1990. Forest modeling approaches: compromises between generality and precision. In: Dixon, R. K., Meldahl, R. S., Ruark, G. A. & Warren, W. G. (Eds.) Process Modeling of Forest Growth Responses to Environmental Stress. Timber Press, Portland, OR: pp. 180–190.