Tutorial on Geographic and Spatial Data Mining

Michael May

15th Italian Symposium on Advanced Database Systems - SEBD’07

Torre Canne, Italy

June 17th

2Michael May

Tutorial Geographic and Spatial Data Mining

Fraunhofer Society

Joseph von Fraunhofer, German physicist and entrepreneur

Fraunhofer mission:

- do state-of-the-art research and use it in challenging customer projects

- Funding is 33% research grants, 33% customer projects, 33% institutional funding

57 institutes, 40 locations, 12.000 employees, 1 bill. € annual volume

Best-known invention: MP3

3Michael May

Fraunhofer IAIS: Intelligent Analysis- and Information Systems„From sensor data to business intelligence, from

media analysis to visual information systems: Ourresearch allows companies to do more with data“

New name, long-standing experience

- Founded in 2006 as a merger of the Fraunhofer institutes AIS and IMK

230 people: scientists, project engineers, technical and administrative staff

Located on Fraunhofer Campus SchlossBirlinghoven/Bonn

Joint research groups and cooperation with Univ. Bonn

4Michael May

Fraunhofer IAIS: research and projects

Core research areas:

Machine learning and adaptive systems

Data Mining and Business Intelligence

Automated media analysis

Interactive access and exploration

Autonomous systems

5Michael May

Objectives

Although it is about statistical concepts, algorithms and data structures, the tutorial has a practical, application oriented focus

Integration of various technologies and algorithms. How do they combine?

Covers a broad range

I do not assume familiarity with spatial concepts, but some basic familiarity with data mining approaches

Three Objectives:

- to stimulate research on spatial data mining related issues - to stimulate development of more efficient spatial databases tailored for data

mining applications- to stimulate real-world applications

6Michael May

A main message

Spatial Data Mining is not an esoteric research topic; it is practically and commercially very important and sometimes business critical field!

Later I give an example where the value of several dozens of companies directly depends on the predictions given by our spatial data mining algorithms.

7Michael May

Spatial vs. Geographic Data Mining

Geographic Data is data related to the earth

Spatial Data Mining deals with physical space in general, from molecular to astronomical level

Geographic Data Mining is a subset of Spatial Data Mining

Allmost all geographic data mining algorithms can work in a general spatial setting (with the same dimensionality)

This tutorial focuses on geographic data in 2D, but most algorithms work on spatial data in general

I do not talk about specificties of molecular data, face detection, etc.

8Michael May

Agenda

Introduction– Spatial and Geographic Data MiningPart I: Basic Concepts – Spatial Databases and GIS

•Spatial Data Types•Spatial Queries•Construction of Complex Features

Part II: Exploratory Analysis of Spatial DataPart III: Spatial and Geographic Data Mining Methods

•Autocorrelation•Mining Point Data – Clustering, Kriging•Mining Points, Lines Areas – Clustering, Subgroup Discovery, Association Rules •Mining Networks – A practical case study•Mining Tracks in Space and Time – Mining from GPS-DataChallengesSummary

9Michael May

Introduction – Spatial Data Mining

( )000 )1(

−⋅

10Michael May

A classical example of spatial analysis

Dr. John SnowInvestigating causes of a cholera epidemiaLondon, September 1854

A good representation is often the key to solving a problem

Disease cluster

Infected water pump?

11Michael May

Good representation because...

Represents spatial relation of objects of the same type

Represents spatial relation of objects to other objects

It is not only important where a cluster is but also, what else is there (e.g. a water-pump)!

Shows only relevant aspects and hides irrelevant

12Michael May

Goals of Spatial Data Mining

Identifying spatial patterns

Identifying spatial objects that are potential generators of patterns

Identifying information relevant for explaining the spatial pattern (and hiding irrelevant information)

Presenting the information in a way that is intuitive to the analyst and supports further analysis

13Michael May

Spatial Data Mining

Data Mining

+Geographic Information Systems

= Spatial Mining

( )000 )1(

−⋅

14Michael May

Basic Concepts Spatial Databases and GIS

( )000 )1(

−⋅

15Michael May

Commercial

Where to build a new supermarket?

Where are the customers that want to buy new product X?

How many cars pass the main road per hour?

Does it pay to install new antennas?

What percentage of young females sees a billboard located in Ripley avenue?

Public Sector

Are there clusters of a certain disease?

Is there a relationship between poverty and death rate?

Are there crime hot spots or patterns?

16Michael May

Buildings

Rivers

StreetsSchools

Hospitals

Factory

Attribute DataPerson p. HouseholdNo. of CarsLong-term illnessAgeProfessionEthnic groupUnemploymentEducationMigrantsMedical establishmentShopping areas...

17Michael May

Elements of a spatial database

Spatial Operators

Spatial Data Types

Spatial Indexes

Spatial Query Language

Metadata

SELECT c.holding_company, c.locationFROM competitor c,

bank bWHERE b.site_id = 1604AND SDO_WITHIN_DISTANCE(c.location,

b.location,'distance=2 unit=mile') = 'TRUE'

INSIDE

Examples from Oracle Spatial

18Michael May

Spatial Datatypes

( )000 )1(

−⋅

19Michael May

Two basic types of representation: Fields and Discrete Objects

Fields:

Raster Data

Discrete Objects: Vector Data Model

20Michael May

Vector Data: Data Structure

Ordered sets of xy-coordinates defining points, lines, or polygons

3D or 4D also possible

PointLine

(Polyline)Area (Polygon)

Easy to scale (linear transformation)

Storage efficient

Relationships between objects (e.g. overlap) are not explicitly represented

Aka „Spaghetti Model“

Straight lines between points

(5,10) ((5,10),(9,16),(12,17)) ((5,10),(9,16),(12,17), …)Data Structure

Draw line from last to first coordinate

21Michael May

Two Main Types of Vector Data

- non regular tesselationsclosed polylines that partition the space

- discrete isolated objects:

point, line, area

PointLine

Area (Polygon)

Tesselations very useful for aggregation of discrete objects and for feature extraction

22Michael May

UK, Greater Manchester, Stockport

BuildingsGeometry

Address

HospitalsGeometry

Address

Description of objects are organized in relations (database tables)

Each row in a table describes one object

Different categories of objects are organized in separate relations each having its own set of attributes.

1Ripley Avenue 23(5,5),(6,6),…3

2Islington Road 2(3,3),(4,4),…2

1Gladstone Street 5(1,1),(2,2),…1

TypeAddressGeometryID

567897Great Moore(3,3),(4,4),…2

234567Stepping Hill(1,1),(2,2),…1

PhoneAddressGeometryID

1Ripley Avenue 23(5,5),32Islington Road 2(3,3),…21Gladstone Street 5(1,1),1

NameGeometryID

NameGeometryIDRivers

Streets

Schools

Factory

23Michael May

Hierarchy

Often data are organized in spatial hierarchies, e.g.

Country

Zip Area

Voting District

Parcel

Hierarchies may overlap

County

District2District1 Districtn

Ward1… Ward1Ward1

WardnWard1Ward2

UK census data

24Michael May

Representation of data in a spatial database

A set of relations R1,...,Rn such that each relation Ri has a geometry attribute Gior an identifier Ai such that Ri can be linked (joined) to a relation Rk having a geometry attribute Gk

- Geometry attributes Gi consist of ordered sets of x,y-pairs defining points, lines, or polygons

- Different types of spatial objects are organized in different relations Ri (geographic layers), e.g. streets,

rivers, enumeration districts, buildings, and

- each layer can have its own set of attributes A1,..., An and at mostone geometry attribute G

25Michael May

Representation of data in a spatial database

A set of relations R1,...,Rn such that each relation Ri has a geometry attribute Gior an identifier Ai such that Ri can be linked (joined) to a relation Rk having a geometry attribute Gk

- Geometry attributes Gi consist of ordered sets of x,y-pairs defining points, lines, or polygons

- Different types of spatial objects are organized in different relations Ri (geographic layers), e.g. streets,

rivers, enumeration districts, buildings, and

- each layer can have its own set of attributes A1,..., An and at mostone geometry attribute G

Does not fit well to standard data mining

approaches!

This is where the specific research challenge for

geographical data mining comes from!

26Michael May

Legend

Mixed conifer

Douglas fir

Oak savannah

Grassland

Raster representation. Each color represents a different value of a nominal-scale field

Longley et al (2001)

How to represent phenomena conceived as fields?

Divide the world into square cells

No variation within cells

Cell value may be average, max, min, sum,central point, …

Represent discrete objects as collections of one or more cells

Represent fields by assigning attribute values to cells

Raster Data

27Michael May

Tutorial Geographic and Spatial Data MiningRaster and Vector: Comparison

Raster ModellAdvantages:

• Simple data structure• Simple logical and algebraic structures

Disadvantages:• Large data volumes• imprecise geometry• expensive transformations of coordinates• implicit coordinates

Vector ModelAdvantages:

• Specify geometry by coordinates• Topological relationships• High geometric accuracy• Storage efficient

Disadvantages:

• Complex data structure• Compute intensive logical and algebraic operations

Remember: „Raster is vaster and vector is correcter“

Legend

Mixed conifer

Douglas fir

Oak savannah

Grassland

28Michael May

Spatial Queries

( )000 )1(

−⋅

29Michael May

Spatial Queries

Problem: Vector data model does not explicitly capture relationships among objects.

They have to be inferred using spatial predicates

Spatial predicates evaluate to true or false for given objects

A query returns

the set of objects of which the statement is true; or

using aggregates the [minimum,maximum,sum,average,…], object(s) of which thestatement is true …

Queries are evaluated using a spatial join among different relations (layers)

Here‘s where database technology and spatial indexing comes in to do the job efficiently!

Still, they can be extremely time consuming!

30Michael May

Spatial Predicates: Egenhofer‘s 9-intersection model

Each object has interior (i), exterior (e) and boundary (b)

This results in a 9-intersection matrix for the relation between two spatial objects A and B

A cell contains a 1 iff the intersection of point sets is non-empty

A meets B A overlaps B A contains B

31Michael May

Spatial Predicates

A inside B, B contains A

A contains B, B inside A

A covered-by B, B covers A

A covers B, B covered by A

A equals B, B equals A

A overlaps B, B overlaps A

A meets B, B meets A

A disjoint B, B disjoint A

9-intersection model for 2 regions (Egenhofer 1991)

INSIDE

32Michael May

Spatial Queries: Distance

Metric spaces:→ Symmetry: d(i,j) = d(j,i) → triangle inequality: d(i,k) ≤ d(i,j)+ d(j,k)

- Euclidian Distance: de(i,j) =

Distance relation between polygons: Minimum distance between any 2 points of the polygons

22 )()( jiji yyxx ++−

33Michael May

Spatial Queries: Distance and Proximity

Selects nearest neighbor in space

Select all object within a certain distance

X DistanceHospital #2

Hospital #1

SELECT c.holding_company, c.locationFROM competitor c,

bank bWHERE b.site_id = 1604AND SDO_WITHIN_DISTANCE(c.location,

b.location,'distance=2 unit=mile') = 'TRUE'

Select all competitors and locations within 2 miles distance from bank with id 1604

Example: Oracle Spatial

34Michael May

Distance – non-metric

non metric spaces → Asymmetry: d(i,j) ≠ d(j,i) → triangle inequality does not hold

drive time

driving distance

35Michael May

Tutorial Geographic and Spatial Data MiningStockport Database Schema

Building

Street

Shopping Region

Vegetation

=zone_id

spatially interact

inside

spatially interactsspatially

interacts

spatially interacts

Attribute data

95 tables with census data,

~8000 attributes

Geographical Layers

85 tables

Spatial Hierarchy

• County

• District

• Wards

• Enumeration district

spatially interact

Standard Join

Spatial Join

Relations between objects implicit; very flexible and storage efficient, but compute intensive

36Michael May

Implementation of Spatial Databases

Many popular databases have spatial extensions by now:

Oracle Spatial

PostgreSQL

MySQL (since 4.1)

37Michael May

Construction of Complex Features

( )000 )1(

−⋅

38Michael May

Spatial Functions

Example: Oracle Spatial 10g

Return a geometry- Union- Difference- Intersect- XOR- Buffer- CenterPoint- ConvexHull

Return a number- Length- Area- Distance

Intersect

Original

Difference

http://colab.cim3.net/file/work/SICoP/2006-06-20/2006-06-21/xlopez06212006.ppt

Constructs new geometry objects from existing ones using point set theory

Efficient implementation using computational geometry

39Michael May

Constructing Cells: Buffer

How many competitors are in the catchment area of my shop?

= How many shops are within the buffer?

Simplistic approximation

Does not take account of barriers (rivers, highways)

Does not take into account road system

40Michael May

Voronoi diagramm

Which are my nearest competitors?

What is the cover of my radio antenna?

= Find voronoi neighbors

Approximation

Does not take account of barriers (rivers, highways)

Does not take into account road system

Decompose space into regions around each point in a set of points S such that all the points in the region around pi are closer to pi than to any other point in S

Complexity:

Related data structure: Delaunay triangulation (graph of Voronoi neighbors)

)lg( nnO

41Michael May

Drive-Time Zone (Dijkstra)

How many competitors are in the catchment area of my shop?

Realistic approximation

Take account of barriers (rivers, highways)

take into account road system, maximum speed on road

All streets segments within a drive time distance <= d from a given starting point

Use Dijkstra‘s algortihm

Complexity:

depending on data structures used for implementation

)lg()( 2 EVVOVO +−

42Michael May

Pre-procesing

Several of the feature extractions are computationally quite expensive (at least for large data sets) and there is often a combinatorial explosion of features that might be constructed.

Several strategies are used in Spatial Warehouse Design:

Selective Pre-processing: materializing important joins in advance (storage requirements!)

Approximate precomputing: e.g. using Minimum Bound Rectangle to approximate polygon

Schema Design (e.g. Star-Schema with selective materialization): Han J., Stefanovic N., Koperski K. Selective Materialization: An Efficient Method for Spatial Data Cube Construction. PAKDD, 1998.

43Michael May

Spatial Database of Vector Objects: Discussion

Relations between objects implicit

Very flexible: depending on analysis task different relationsships can be constructed

storage efficient; no overhead for storing relationship information

compute intensive (thus spatial Indexing very important)

Consider what and when to materialize

Very rich possibilities to create new, non-trivial objects from existing ones

Makes feature extraction an important topic for Data Mining

Inherently multi-relational setting (but not first-order)

Could also be formulated in a deductive database setting

44Michael May

Interactive Visualization of Spatial Data –

Exploratory Data Analysis

( )000 )1(

−⋅

45Michael May

Interactive Visualization of Spatial Data –Exploratory Data Analysis

(work by G. Andrienko & N. Andrienko, H. Voss and others at Fraunhofer IAIS)

For the theory behind CommonGIS, see the book

Andrienko, N. and Andrienko G.: Exploratory Analysis of Spatial and Temporal Data - A Systematic Approach, Springer, 2005

46Michael May

Geographic Information Systems and CommonGIS

Many commercial tools available

- ESRI ARC GIS- Mapinfo- Intergraph- Manifold

But CommonGIS is different and unique …

- Map-based exploratory data analysis- stresses interactive visualization manipulation of statistical data in space- elaborated facilities for time-series visualization

CommonGIS can be aquired for non-commercial use by educational instutions for no fee

See web page www.commongis.com

47Michael May

- Time-series visualization and analysis

- Combines Vector-Rastertransformation

- Weighted Sums

- Ideal Point Analysis

- Similarity analysis

- Dominant Attribut

- Integration with Weka (Clustering, Decision Trees)

MultivariateDecision supportMulti-dimensional

= Fraunhofer IAIS Tool for Map-based Exploratory Data Analysis - combines interactive cartography and statistics

CommonGIS

48Michael May

CommonGIS: Visual analysis of spatial data

Interactive spatial search for geographic objects and recognition of spatial patterns: dynamic choropleth maps, pie charts, bar charts, etc. with dynamic removal of outliers and dynamic queries Comparison of attribute values of geographic objects (relations and correlations) and comparison of spatial patterns (spatial correlations): (Linked) dynamic maps and interactive diagramsmultiple (linked) dynamic maps

49Michael May

CommonGIS: Visual analysis of spatio-temporal data

CommonGIS as an interactive browser to study how a spatial pattern evolves over time:

time aware maps (animations)

time series charts

CommonGIS as an interactive browser for temporal behaviours of objects:

set of controls for analysing time intervals (object animations)

CommonGIS as an interactive browser of discrete space-time events to find spatio-temporal clusters:

space-time cube

50Michael May

Tutorial Geographic and Spatial Data MiningTime Series – Sales per Shop and Product Category

51Michael May

Time-Series: Sales per Shop and Product Category

BäckereiStehcaféSitzcaféTerrasse

Different Time Hierarchies(Year, Quarter, Month, Day…)

52Michael May

CommonGIS: Data transformation

Transformation of data for further analysis: Attribute transformations: calculate statistical indices transform and combine attribute data arithmetically dynamic classifiers (linked with dynamic choropleth map) cross classifiers (linked with dynamic choropleth map) Geographic transformations:query, transform, combine, derive raster data illumination model raster -> vector transformations (i.e. raster -> area aggregation) point/line -> raster transformations

53Michael May

Tutorial Geographic and Spatial Data MiningCommonGIS: Combination of Vector and image data

54Michael May

Geographic and Spatial Data Mining Methods

( )000 )1(

−⋅

55Michael May

Autocorrelation

( )000 )1(

−⋅

56Michael May

Spatial Variation

How are variables distributed in space?

Tobler‘s First Law of Geography:

„Everything is related to everything else, butnear things are more related than distantthings.“

distribution of variables depends on space

variables are autocorrelated

Field Soil Moisture

Franke, diploma thesis, Leipzig Univ., 2006

57Michael May

Spatial Autocorrelation: Binary Example

binary attribute (blue, white)

autocorrelation to four immediate neighbors

Moran Index (here):

I = 0.86 I = 0.00 I = -1.00Goodchild, CATMOG, GeoBooks, Norwich, 1986

I = 0.39

changeequal

- change

- equal

58Michael May

Moran‘s I

Morans‘s I is a measure for spatial autocorrelation. It is a weighted correlationcoefficient used to detect departures from spatial randomness. Departures fromrandomness indicate spatial patterns such as clusters and geographic trend.

Values of I larger than 0 indicate positive spatial autocorrelation; values smaller than0 indicate negative spatial autocorrelation.

Moran's I is a weighted product-moment correlation coefficient, where the weightsreflect geographic proximity.

z – attribute of interest; w – weight; n – number of areal objects

∑∑∑

∑∑

== =≠

−−= n

jjijiij

zzzzwnI

CD 0110D

1011C1101B0110ADCBAwij

weight matrix

Example:n = 4

59Michael May

Spatial Autocorrelation

similarity in location indicates similarity in attributevalue

differs from temporal autocorrelation

- 1 – dimensional autocorrelation in time series, spatial autocorrelation spreads in 2 or 3 dimensions

- only forward causality in time series, direction of causality not restricted in space

depends on scaleTemperature of

Sunspots

Sunspot Time Seriesyear

60Michael May

Effects of Autocorrelation

makes spatial abstraction possible

makes standard approaches of analysis impossible

- most statistics assume iid

makes local inference attractive

- Kriging, kNN, …

makes choice of sampling interval hard

- autocorrelation depends on scale

makes interpolation easier than extrapolation

zero autocorrelation = independence of location

distance

+1spatial autocorrelation

61Michael May

Problem types for Spatial Data Mining

Spatial Data Mining := partially automated search for patterns and models in large spatial databases

Classification of methods along the following hierarchy

Points

Points, Lines and Area

Networks

Tracks in space and time

62Michael May

Handling spatial data in Data Mining – Basic Options

Treat as ordinary variables

no special algorithms neededspatial properties ignored, e. g. discontiguous areas

Make spatial relationships explicit

e. g. infer topological relationshipexpensive, but allows normal algorithms to be usedCan by done as pre-processing or dynamically (latter requires specialized algortihms)

Specialized algorithms

- Neighborhood methods, kriging, Gaussian processes, density-based clustering …

Use proper combination of data, preprocessing, algorithms, and interaction software!

63Michael May

Mining Point Data

( )000 )1(

−⋅

64Michael May

Mining Point Data

Points

Space Complexity

Time Complexity

65Michael May

Clustering spatial point data

Point data conceived as discrete objects

Many approaches exists for clustering spatial point data

In statistics, measures of spatial randomness or non-randomness have been developed (e.g. Ripley 1991, Cressie 1993)

- Ripley‘s K function as measuring deviation from complete spatial randomness (as exemplified by a Poisson process)

- Moran‘s I, which measures autocorrelation

Bayesian approaches often coming from image analysis (cf. Lawson et al 2002)

In Geography, spatial clustering algorithms have been developed (Openshaw, GAM, 1991)

66Michael May

Density Based Clustering – a KDD approach [Ester et al. 1996]

Suitable for large databases

Discovers areas of high density and turns them into clusters

Discovers clusters of arbitrary shape

Can handle noise

Algorithm DBSCAN

Note: Relatively straightforward extension to vector data possible (GDBSCAN); requires more complex definition of some key concepts (neighborhood and MinPts)

67Michael May

Clustering spatial data

distance-based clustering is inherently spatial

but assumption of convex clusters (e.g. k-means) inappropriate for many “geographical” tasks

source: Ester et al 1997

68Michael May

Definitions 1

Eps-neighborhood of a point pNε (p) := {q ∈ D | dist (p, q) ≤ ε }

A point p is directly density-reachable from q iff

1. p ∈ N ε(q)2. |N ε (q)|>MinPts (“q is core object”)

- Not necessarily symmetric

pp qqp: border object

q:core object

P directly density reachable from q

Q not directly density reachable from p

Definition of Eps is a crucial parameter!

69Michael May

Definitions 2density-reachable = p is density-reachable from point q wrt to Eps and MinPts iff there

is a chain of points p1,…,pn, p1=q,pn=p such that pi+1 is directly density-reachable from pi

Transitive, not symmetric

p is density-connected to q iff there is point o such that p and q are density-reachable from o wrt to Eps and MinPts.

q p op and q density-

connected to each other by o

p density reachable from q

q not density reachable from p

Symmetric

70Michael May

Density-connected clustering

A cluster C wrt. To Eps and MinPts is a non-empty subset of database D, where

(1) ∀p,q: if p ∈ C and q is density-reachable from p wrt Eps and MinPts, then q ∈ C

(2) ∀p,q ∈ C: p is density connected to q wrt to Eps and MinPts.

Non-covered points are noise

Each cluster contains at least MinPts

Exactly one clustering

71Michael May

Algorithm DBScan – Basic Idea

Check Eps-Neigborhood of every unclassified point in database

If neighborhood of p contains more than MinPts, a new cluster with p as core object is build

Collect directly density reachable objects from this set, merging clusters as necessary

Terminate when no new point can be added to any cluster

Complexity: O(n log n) when spatial index is used, otherwise O(n2)

72Michael May

Kriging-Spatial Interpolation

( )000 )1(

−⋅

73Michael May

Kriging

developed by G. Matheron in the 1960s based on work of D. Krige

geostatistical method of interpolation

Point data conceived as samples from a continuous surface

results are smoothly varying surfaces

provides optimality given assumptions (best linear unbiased estimate)

variety of methods, e.g. Ordinary Kriging, Universal Kriging, Co-Kriging, Block Kriging, Stratified Kriging, Indicator Kriging, …

??• – measurements

? – unknown values

Good introduction: Burrough, P., McDonnell, R 1998

74Michael May

Spatial Variation

Problem:

spatial variation of a continuous attribute is often too irregular to be modelled by a simple, smooth mathematical function

Solution:

variation can be described by stochastic surface

x – location in n-dimensional space

Z(x) –random variable of interest, e.g. soilmoisture

A stochastic process is a family of random variables Z(x) over the index set D ⊂ ℜn:

{ }DxxZ ∈:)(

A Gaussian process is a stochastic process for which any finite set of Z-variables has a joint multivariate Gaussian distribution.

75Michael May

Components of Spatial Variation

structural component, having a constant mean or trend

random, but spatially correlated component (regionalized variable)

spatially uncorrelated random noise term

'')(')()( εε ++= xxmxZ

trend autocorrelation random noise

value at location x is random variable

76Michael May

Stationarity

Problem:

spatial data set is single realization of random process

inference is impossible without further restrictions on spatial variation

Intrinsic Stationarity (stationarity under translation):

constant mean (E[...] = 0) or trend (E[...] > 0):

variance of differences h is independent of location:

Isotropy (stationarity under rotation) :

spatial process evolves the same in all directions

[ ] .)()( consthxZxZE =+−

2E {Z(x) Z(x h)} 2 (h)⎡ ⎤− + = γ⎣ ⎦ x

77Michael May

Ordinary Kriging

Assumptions:

intrinsic stationarity with a constant mean

- constant mean value in sampling area

- variance of differences depends only on the distance h between sites

Once structural effects have been accounted for, remaining variation ishomogeneous in variance so that difference at sites are merely a function of differences between them.

[ ]])}(')('[{

)(2])}()([{)()(2

hxxEhhxZxZEhxZxZVar

=+−=+−

[ ] 0)()( =+− hxZxZE

semivariance

78Michael May

Ordinary Kriging

Proceedure:

1. Estimate semivariance γ(h) from data sample

2. Plot the experimental variogram

3. Fit a theoretical model to the experimental variogram

4. Estimate unknown values as weighted sum of neighboring measurements, determineoptimal weights from variogram

79Michael May

Semivariance and Experimental Variogram

semivariance depends only on distance (lag) h

estimate semivariance between all pairs of measurements with distance h (repeat forall possible h)

{ }∑=

+−=n

iii hxzxz

2)()(21)(γ̂

Experimental Variogram

80Michael May

Variogram nugget:

- γ(h) = 0 (by definition)- nugget effect represents small scale

variation and measurement errors- estimate of ε‘‘

range:

- spatial dependency- here, variance of differences increases

with distance- two points are more similar the closer

they are

- semivariance levels off- variance of differences h is

independent of distancelag h

nugget

{ }∑=

+−=n

iii hxzxz

2)()(21)(γ̂

81Michael May

Variogram Models

experimental variogrammust be fitted to an appropriate variogrammodel

most commonly used arethe spherical, exponential, linear orGaussian model

Spherical Model

Exponential Model

Linear Model

Gaussian Model

82Michael May

Interpolation of unknown Values

unknown value at location x0 is estimated as weighted sum of neighboringmeasurements

weights wi are determined according to two restrictions

- Z*(x0) is an unbiased estimate of Z(x0)- Z*(x0) is an optimal estimate

Have to solve system of n+1 linear equations of semivariances and weights

iii xZwxZ

* )()(

83Michael May

Equation System

restriction on weights introduces Lagrange parameter φ (Restriction 1)

system of (n+1) equations must be solved to obtain optimal weights for each x0

1 1 1 n 1 1 0

n 1 n n n n 0

(x x ) (x x ) 1 w (x x )

(x x ) (x x ) 1 w (x x )1 1 0 1

γ − γ − γ −⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟=⎜ ⎟ ⎜ ⎟ ⎜ ⎟γ − γ − γ −⎜ ⎟ ⎜ ⎟ ⎜ ⎟φ⎝ ⎠ ⎝ ⎠ ⎝ ⎠

M O M M M M

Ordinary Kriging is an exact interpolator, i.e. interpolated value of a sample locationwill be identical with the measurement taken

84Michael May

Variants of Kriging

Universal Kriging

structural component may contain a external trend

Co-Kriging

interpolation for one attribute incorporates information of another, correlated attribute

sparse measurements of an expensive variable are supported by plentymeasurements of a cheap variable

Stratified Kriging

interpolation within sub-areas

equations are adjusted to avoid discontinuities on boundaries

More Details: Burrough, P., McDonnell, R 1998

85Michael May

Mining Points, Lines, and Areas

( )000 )1(

−⋅

86Michael May

Points, Lines and Areas

Points

Space Complexity

Time Complexity

Points, Lines, and Areas

87Michael May

Points, Lines and Areas

Requirements:• Point data• Polygons• aggregations

Applications• Customer Segmentation,• Catchment Areas,• Location Planning,• Radio Network Analysis

Examples:• GDBScan Clustering• Spatial Subgroup Minig• Spatial Association Rules• Spatial Model Trees

88Michael May

Clustering of Vector Data: GDBScan [Sander et al 1998]Extension of DBSCan - Sample Instantiations

dist < ε intersects/meets neighbor

| S | ≥ MinCard ∑ areas ≥ MinArea f (S) ≥ MinF

89Michael May

Spatial Subgroup Mining

( )000 )1(

−⋅

90Michael May

Typical Data Mining representation

Data Mining for spatial data: very different from this representation

‘spreadsheet data’exactly 1 table

atomic values

91Michael May

Subgroup Discovery Search (Klösgen 1996, Wrobel 1997)Subgroup discovery searches deviation patterns for subgroups

overproportionally high share of target value (or mean of target variable)

Top-down search from most general to most specific subgroups, exploiting partialordering of subgroups

S1 ≥ S2 S1 more general than S2

Beam search expands only the n best ones at each level

Evaluating hypothesis according to quality function:

N= Total populationn= subgroup size

p(T)= target share in total populationp(T|C)= target share in subgroup

Extension to multi-relational representation in Wrobel (1997)

TpTpTpCTp

−−−

))(1)(()()|(

92Michael May

Translating Multirelational Subgroups to Object-relational SQL

Domain: relational database schema D = {R1, ..., Rn} having geometry attributes Gi

Hypothesis Language

Multirelational subgroups are represented by a concept set C = {Ci}, where each Ci consists of a set of attribute value-pairs {A1=v1,...,An=vn} from a relation in D,

a set of links L={Li} linking concepts Ci , Ck via their attributes Am, Ak of the form (Ci/Am {=|inside| overlaps|...|spatially_interact} Ck/An)

target attribute can be non-numeric (A1=v1) or numeric aggregate (avg(A)=n)

Example:C= {{district.long_term_illness=high, district.unemplyoment=high},{street.name=’Manchester

Road’}}

L= {{district.geometry spatially_interact street.geometry}}

“Enumeration districts with high rate of long term illness and unemplyoment crossed by Manchester Road”

Testing satisfaction of subgroup descriptions

The number of tuples in D that satisfies a subgroup description is evaluated using SQL select statements including joins over multiple relations.

93Michael May

Approach: Translation of Spatial Subgroup Mining to SQL (Klösgen, May 2002)

• Representing subgroups in object-relational SQL, i.e. multi-relational representation

• Using representation for spatial geometry based on Spatial Database

• Division of work between RDBMS and Search Manager

• Combining visualization in abstract and physical space

94Michael May

Division of labour between RDBMS and Search Manager (May, Savinov 2003)

Database Server Search Algorithm

Mining Serverstatistics

• search in hypothesis space

• generation and evaluation of hypotheses(subgroup patterns)

mining query

• Database integration: efficiently organize mining queries

• Mining query delivers statistics (aggregations)sufficient for evaluating many hypotheses

95Michael May

SPIN! – Spatial Data Mining System

Workspace

Property EditorSubgroup Viewer

Flowchart-Tool

SubgroupResult List

96Michael May

Interactive Exploratory Analysis

Combination of spatial and non-spatial visualization

User selects and manipulates variables

Powerful for analysis in low dimensions (3-4)

Scatter Plot

Parallel Coordinate Plot

Choropleth Maps

Display dynamically linked

97Michael May

Visualization of spatial sugroups

Linked Display

Spatial Venn DiagramSubgroup Overview

p(T|C) vs. p(C)

Subgroup

High long-term illness in districts crossed by M60

98Michael May

Radio Network Planning in Telecommunication

High cut of call ration in mountanous regions crossed by highways

having a certain technical configuration

Legende:

Blau: AutobahnBraun: große HöheSchwarz: Subgruppe

Mapviewer(Common GIS)

99Michael May

Other commercial applications of Subgroup Discovery

How are my customers characterized. Are there interesting profiles?

Where to open the next supermarket? Does it create competition for my other supermarkets?

Should I invest in UMTS in rural areas?

100Michael May

Spatial Association Rules

work and slides by Donato Malerba et al., Univ. Bari

( )000 )1(

−⋅

101Michael May

Spatial association rules

An association pattern PP (s%)(s%) is a spatial association pattern if it contains at least one spatialrelation

A large town intersects a road and is adjacent to water (62%)

An association rule QQ→→ RR (s%, c%)(s%, c%) is a spatial association rule if QQ∧∧RR is a spatialassociation pattern

IF a large town intersects a road

THEN it is also adjacent to water (62%, 89%)

Malerba et alSeminal work by Koperski & Han 1995

102Michael May

The problem

Givena spatial database (SDB) with a set of reference objects SS,some set RRkk, 1≤k≤m, of task-relevant objectssome spatial hierarchies HHkk involving objects in Rk

MM granularity levels in the descriptionsa set of granularity assignments ψψkk which associate each object in

Hk with a granularity levela couple of thresholds minsupminsup[l][l] and minconfminconf[l][l] for each

granularity levela domain knowledge

Find strong multiple-level spatial association rules.

Malerba et al

103Michael May

The solution

Solution (Appice et al., IDA Journal, 2003)

based on an Inductive Logic Programming (ILP) approach spatial relations easily handled

spatial pattern conjuction of first-order logic atoms

θ-subsumption orders the space of spatial patterns

monotonicity of support w.r.t. θ-subsumption pruning of patterns at the samegranularity level in the candidate generation phase

monotonicity of pattern frequency w.r.t. granularity level pruning of patternsat different granularity levels in the candidate generation phase

Implemented in SPADA (Spatial Pattern Discovery Algorithm)

European project SPIN (Spatial Mining for Data of Public Interest)

104Michael May

Extensions of initial solutions

Efficiency improvement of pattern evaluation by caching support objects for each stored pattern

Definition of a declarative bias to filter out rules on the basis of users’ preferences efficiency improvement is a byproduct

- In real-world applications a large number of spatial patterns can be generated even for a few hundred spatial objects.

- Most of discovered patterns are useless for the application at hand- Urban accessibility application: only spatial patterns involving some sociological factor

(household with no car) are interesting.

Integration of SPADA in the ARES system that interfaces a Spatial DB (Oracle Spatial)

105Michael May

Mining Network Data

( )000 )1(

−⋅

106Michael May

Networks

Points

Space Complexity

Time complexity

Networks

107Michael May

Points and Networks

• Requirements:• Point Data • Polygons• Aggregations• Spatial dependencies and relations,

networks

• Examples: Traffic frequency prediction

• Method:• kNN

108Michael May

Case Study: Outdoor Advertising - Frequency Atlas

Customer:

Fachverband für Außenwerbung(FAW; German Outdoor Advertising Association)

Performance value assessment of advertisingmedia

Traffic volume forecast

separate for private cars, public transport, pedestrians

109Michael May

Frequency + Media factories = poster reach

Gesellschaft für Konsumforschung

Determining reach of a poster board

110Michael May

The project in numbers

Complete model for all German citieswith more than 50.000 inhabitants(192 cities) = ca 1.000.000 street segments!

Complete model includes, for each segment,item

- car frequency- pedestrian frequency- public transport frequency

The model is presently beeing extendedto to all cities with between 10.000 and 50.000 inhabitants

111Michael May

Basic Data: traffic measurements

Manual traffic measurement at selectedposter locations

- 4 times 6 minutes at four days of theweek at four times of day

Additional empirical model of day totals

Properties

- Well defined measurements- Extended measurement period, so

concept drift can not be excluded

Total of 96.000 manual measurements

112Michael May

Street networkSociodemographics + Socioeconomics

Public transportnetwork

Frequencymeasurements

0 200 400 600 800 1000 1250 1500 1750 2000 ...

DATA MINING

Points of Interest(POI)

Frequency classes

Secondary data

113Michael May

Local Measurements

Inhomogeneous measurements on the same street

How Spatial Autocorrelation helps

843820 1200

114Michael May

Attributes of street segments:

- Name, type, …. class- Points of Interest- Spatial coordinates

Locations with measurement values

Spatial kNN

Distance beetween two segments xa, xb

Selection of the k closest x1, …, xk

Prediction for new segment xq

(Project has actually used specially adapted distance measure)

( ) ∑=

mbmamba xxxxd

∑∑==

iiq wywy

iqi xxd

w =with

Segment

115Michael May

Spatial KNN - Properties

kNN captures well autocorrelation inherent in the data

Allows to bring in background knowledge by fine-tuning distance function

Database Integrated (Oracle Spatial)

Performs dynamic spatial query (minimum distances among polygons)

Performance improvements

Spatial Queries use Index Structures (R-Tree), still relatively costly (i.e. dominates overall run-time)

Partial evaluation of distance function based on lower bounds for distance to minimize number of spatial queries

Can handle data sets that do not fit into main memory

116Michael May

Smoothing based on flow constraints

Measurement errors lead to inconsistencies

Need plausible assignment of frequencies

Solution:

Use Kirchhoff’s law as constraint

- Sum of inputs = sum of outputs

Smoothing algorithm finds locally optimal solution using constraint relaxation

117Michael May

Explaining frequencies

Problem: Customer wants transparent values, not a black box

=> Problem for Spatial kNN

Solution: Fit an explanatory model to the predicted values

Allows to understand why predictions are as they are

Allows to identify potential outliers and areas of high uncertainty

⇒ Use Model Trees

⇒ Geographic Space encoded in x-y coordinates

118Michael May

Numerical prediction with model trees

LM1FREQUENZ =

2277.3186 * X +75.4087 * ANZAHL_EINKAUF +

-142.4217 * MESSE +-21221.8497

Fussgängerzone:

Nein | Ja

Bahnhof

Nein | Ja

Distanz_zu_Bahnhof:

<= 150 | > 150

Anzahl_Restaurants :

<= 5 | > 5

ORTSTEIL =

INNENSTADT (LR) | ...

Straßenkategorie:

Nebenstr. | Hauptstr.

Y-Koordinate

<= 9.6 | > 9.6

X-Koordinate

<= 52.385 | > 52.385

Anzahl_Restaurants :

<= 15 | > 15

LM1 LM2 LM4 LM5

119Michael May

Improving model by spotting outliers based on model tree prediction

Points with great prediction error are checked

- Visual inspection- Getting additional empirical input by taking new measurements

Corrected values are basis for next round in model building, leading to improved results

120Michael May

Tutorial Geographic and Spatial Data MiningFinal Result: Frequency Map

Cars Public Transport

Pedestrians

PedestriansCarsPublic Transport

121Michael May

~1 Million street segments predicted based on 96.000 measurements

Final result: frequency atlas(cars, public transport, pedestrians)

Used for determining poster prices in Germany since 2006

Rare instance of a spatial data mining problem that has become business critical

122Michael May

Spatial Model Trees [Malerba, Appice, Cecci 2005]

Standard Model Trees (e.g. M5‘) can do Spatial Mining by splitting along x and y coordinatesMrs-Smoti (Malerba et al. 2004) is a variant of Model Trees that

- Allows regression nodes as interior nodes- Handles directly autocorrelation:

Spatial regression model with dependencies in response variables:spatially lagged response

It inputs spatial objects eventually belonging to separate thematic layers stored in a spatial database S

- target objects (main subject of analysis)- non target objects (relevant for the task in hand)

and outputs a spatial model tree T by - partitioning training spatial data

according to intra-layer and inter-layer relationships

- associating different regression models to disjoint spatial areas

Integrates spatial database queries (see Subgroup Discovery)

Y’=c+dX’3

Y’=e+fX’2

X’4 ≤ γ

Y’=g+hX’3

0Y=a+bX1

1X’3 ≤ α

Y’=i+lX’4X’2 ≤ β

123Michael May

Mining Tracks in Space and Time

( )000 )1(

−⋅

124Michael May

Tracks in Space and Time

Points

Space Complexity

Time complexity

Tracks in Space and Time

Networks

125Michael May

Tracks in space and time

• Requirements:• Point daa• Polygons• Aggregations• Networks• Tracks,

GPS/RFID/Sensor-Measurement

• Applications:Traffic prediction, Mobility analysis

• Examples• Sampling, Event analysis, non-linear

optimization

126Michael May

Mobility analysis based on GPS-tracks

introduction of new pricing model forposter sites based on GPS tracks

registration of contact frequencies withposter sites

contact extrapolation for target groups:

- socio-demographic characteristics- residential areas

Media Trend Journal, Nov, 2006

127Michael May

Time patterns

Patterns / Questions

- How long (days) does it take till x%of objects visit all locations?

- How long does it take till x% of objects visit at least one locationtwice?

Applications

- determine mobility of a group of people

- reach of poster networks- find popularity of locations (theatres,

supermarkets, hospitals)

128Michael May

Modelling tasks

Modelling mobility for cities with GPS-measurements for the overall population

Predicting mobility for cities without measurements (hard task!)

Extrapolating predictions in time

129Michael May

GeoPKDD - FET Project IST-014915

Geographic Privacy-aware Knowledge Discovery and Delivery

December 2005 – November 2008

Project Leader: Fosca Giannotti

http://www.geopkdd.eu

General Project Ideaextracting user-consumable forms of knowledge from large amounts of raw geographic data referenced in space and in time.

knowledge discovery and analysis methods for trajectories of moving objects, which change their position in time, and possibly also their shape or other significant features

devising privacy-preserving methods for data mining from sources that typically contain personal sensitive data

130Michael May

The Consortium

ID Acronym Partner Country

1 KDDLAB Knowledge Discovery and Delivery Laboratory, ISTI-CNR, Istituto di Scienza e Tecnologie dell’Informazione, Pisa. http://www.isti.cnr.it/ - jointly with Univ. Pisa, Dept. of Computer Science http://www.di.unipi.it

2 LUC Univ. Limburg, Theoretical Computer Science Group. http://www.luc.ac.be/theocomp B

3 EPFL EPFL, Lab. DB, Lausanne. http://lbdwww.epfl.ch/e/ CH

4 FAIS Fraunhofer Institute for Autonomous Intelligent Systems, Sankt Augustin. http://www.ais.fraunhofer.de/

5 WUR Wageningen UR, Centre for GeoInformation. http://cgi.girs.wageningen-ur.nl/ NL

6 CTI Research Academic Computer Technology Institute, Research and Development Division. http://www.cti.gr/ - jointly with Univ. Piraeus, Dept. of Informatics http://www.unipi.gr

7 UNISAB Sabanci University, Faculty of Engineering and Natural Sciences. http://www.sabanciuniv.edu/ TK

8 WIND WIND Telecomunicazioni SpA, Direzione Reti Wind Progetti Finanziati & Technology Scouting. I

131Michael May

Geographic Privacy-aware Knowledge Discovery Process

Traffic Management

Accessibility of services

Mobility evolution

Urban planning

interpretation visualization

trajectory reconstruction

p(x)=0.02

warehouse

p(x)=0.02

ST patterns

Trajectories warehouse

Privacy-aware Data mining

Bandwidth/Power optimization

Mobile cells planning

Public administration or business companies

Telecommunication company (WIND)

GeoKnowledge

Aggregative Location-based services

Privacy enforcement

Traffic Management

Accessibility of services

Mobility evolution

Urban planning

p(x)=0.02

warehouse

p(x)=0.02

ST patterns

Trajectories warehouse

Privacy-aware Data mining

Bandwidth/Power optimization

Mobile cells planning

Public administration or business companies

Telecommunication company (WIND)

GeoKnowledge

Aggregative Location-based services

Privacy enforcement

132Michael May

GeoPKDD – Specific Goals

models for moving objects, and data warehouse methods to store their trajectories

knowledge discovery and analysis methods for moving objects and trajectories,

techniques to make such methods privacy-preserving

techniques for reasoning on spatio-temporal knowledge and on background knowledge

techniques for delivering the extracted knowledge within the geographic framework

133Michael May

From Traces to Trajectories: the Source Data

GSM network

Entering the cell

- e.g. (UserID, time, IDcell, in)

Exiting the cell

- e.g. (UserID, time, IDcell, out)

Movements inside the cell?

- Eg (UserID, time, X,Y, Idcell

streams of log data of mobile phones, e.g. cells in the GSM/UMTS network

Real trajectories are continuous functions

Logs are discrete sampling of real trajectories, dependent on the wireless network technology

- unregular granularity in time and space- possible imperfection/imprecision

An approximated reconstruction of the real trajectory from its log traces is needed

Source: Pedreschi & Giannotti, 2005

134Michael May

Movement patterns

ClusteringGroup together similar trajectories

For each group produce a summary

Frequent patternsDiscover frequently followed (sub)paths

ClassificationExtract behaviour rules from history

Use them to predict behaviour of future users 60

5%20%?

Source: Pedreschi & Giannotti, 2005

135Michael May

Why emphasis on privacy?

More, better data are gathered, more vulnerability from correlation

On the other hand, more and new data bring new opportunities

Need to maintain privacy without giving up opportunities

Need to obtain social acceptance through demonstrably trustworthy solutions

... is a technical issue, besides ethical, social and legal, in the specific context of ST data

How to formalize privacy constraints over ST data and ST patterns?

- E.g., anonymity threshold on clusters of individual trajectories

How to design DM algorithms that, by construction, only yield patterns that meet the privacy constraints?

Privacy in GeoPKDD

136Michael May

Challenges

( )000 )1(

−⋅

137Michael May

Causal Inference from Statistical Spatio-Temporal Data

Current project at IAIS for newspaper publisher:

Sales prediction of individual shops.

What happens if a shop closes or is sold out? Predict to which alternative shop customers go.

Spatio-Temporal Clustering of shops

Time Series Prediction

Modeling customer behavior

⇒ Causal inference about customer behavior

„If shop A closes, n% of A‘s customers go to B, m% to C“

138Michael May

Sales data per day per shop for several years available

Use similarity of time series over some period for determining anomaly in behavior

139Michael May

Closed Shop

Other shops

Alternative shops

strong weakUse spatial structure to infer potential alternative shops.

People went from A to B when A is closed and B shows anomaly in behavior that cannot be explained otherwise

140Michael May

Closed Shop

Other shops

Alternative shops

strong weakDiagramms such as this one can be generated automatically for historic

Challenge: based on historic examples come up with a predictive model

141Michael May

Ubiquitous Knowledge Discovery

Ubiquitous Knowledge Discovery (Embedded Data Mining and mobile and /or distributed mobile, micro processors)

Grid Mining (Distributed Architecture, GridComputing)

Knowledge Discovery in mobile Systems(Robots, RFID, GPS, mobile phones, Cars, ...)

Static and dynamic Sensor networks (RealityMining)

Privacy-Preserving Data Mining

KDUbiq Coordination Action (EU, 2005-2008) – www.kdubiq.org

142Michael May

Ubiquitous Knowledge Discovery

Characteristics of ubiquitous knowledge discovery systemsobjects are distributed in time and space

dynamic infrastructure (moving objects, appear and disappear)

analysis situation is in real-time, models evolve incrementallyobjects have access to local information only,

never see the global picture: only knowledge of local spatial environment

typically, objects exchange information with other objects

Spatial Data Mining is a key issue here!

KDUbiq reflects the future research challenges involved in this area

143Michael May

Summary

Spatial Data form a rich environment for analysis

Feature extraction and construction (Spatial Queries & Functions, Voronoi,…) play a very important role

Efficiency is often a big concern

A variety of approaches to Spatial Data Mining exist, coming from Statistics, Databases, Machine Learning

We have seen examples for density based clustering, kriging, subgroup discovery, association rules, model trees, kNN, Survival Analysis

Methods are different in the data types they can handle

Real-world applications are feasible today

Many more challenges in the future due to ubiquous environments!

144Michael May

Literature (1)

Andrienko, N. and Andrienko G.: Exploratory Analysis of Spatial and Temporal Data - A Systematic Approach, Springer, 2005Appice, A., M. Ceci, A. Lanza, F.A. Lisi, & D. Malerba (2003). Discovery of Spatial Association Rules in Georeferenced Census Data: A Relational Mining Approach, Intelligent Data Analysis, 7, 6.Burrough, P., McDonnell, R., Principles of Geographical Information Systems, OUP, 1998Cressie, N, 1993. Statistics for Spatial Data, WileyEgenhofer, M.. Reasoning about binary topological relations. In Gunther O. and Schek H.-J., editors, Second Symposium on Large Spatial Databases, volume 525 of LNCS, pages 143--160. Springer, 1991.Ester M., Kriegel H.-P., Sander J. and Xu X. 1996. “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise”. Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining. Portland, OR, 226-231.Giannotti, F., Nanni, M., Pedreschi, P.: Efficient Mining of Temporally Annotated Sequences. SDM 2006Goodchild, M.F., Spatial Autocorrelation. CATMOG 47,Geobooks. 1986, Norwich UK.Han J., Stefanovic N., Koperski K. Selective Materialization: An Efficient Method for Spatial Data Cube Construction. PAKDD, 1998.Klösgen, W. (1996) Explora: A multipattern and multistrategy discovery assistant In Fayyad, Advances in Knowledge Discovery and Data Mining. MIT Press.Klösgen, W., May, M.: Spatial Subgroup Mining Integrated in an Object-Relational Spatial Database. PKDD 2002: 275-28Klösgen, W., May, M., Petch, J. 2003, Mining census data for spatial effects on mortality, Intelligent Data Analysis Issue: Volume 7, Number 6 / 2003 Pages: 521 - 540

145Michael May

Literature (2)

Koperski, K., Han, J, Discovery of Spatial Association Rules in Geographic Information Databases (1995), Proc. 4th Int. Symp. Advances in Spatial Databases, SSDKoperski, K. , J. Adhikary and J. Han, `` Spatial Data Mining: Progress and Challenges'', 1996 SIGMOD'96 Workshop. on Research Issues on Data Mining and Knowledge Discovery (DMKD'96), Montreal, Canada, June 1996Lawson, A. B. and Denison, D. (2002) (eds) Spatial Cluster Modelling Chapman & Hall CRC, London. Lisi, F.A, D. Malerba (2004).Inducing Multi-Level Association Rules from Multiple Relations.Machine Learning, 55:175-210.Longley, P., Goodchild, M, MacGuire, D., Rhind, D, 2001. Geographic Informations Systems and Science, WileyMalerba, D., Appice, A., Cecci, M. 2005, Mining Model Trees from Spatial Data, LNCS, PKDD2005May, M., Ragia, L. 2002, Spatial Subgroup Discovery Applied to the Analysis of Vegetation Data, PAKM 2002, LNCS 2569May, M., Savinov, A 2004 SPIN!-An Enterprise Architecture for Spatial Data Mining, Knowledge-Based Intelligent Information and Engineering Systems, LNCS 2773, 2003Openshaw, S., and Craft, A., (1991) 'Using geographical analysis machines to search for evidence of cluster and clustering in childhood leukaemia and non-Hodgkin Lymphomas in Britain. In G. Draper (ed) 'The Geographical Epidemiology of Childhood Leukaemia and non-Hodgkin Lymphomas in Great Britain 1966-83', Studies in Medical and Population Subjects No 53, OPCS, London, HMSOBurroughsRipley, B. 1988, Statistical Inference for Spatial Processes, CUPSander, J. , M. Ester, H.-P. Kriegel, and X. Xu. Density-based clustering in spatial databases: The algorithm gdbscan and its applications. Data Mining and Knowledge Discovery, 2(2):169--194, 1998.Wrobel, S. : An Algorithm for Multi-relational Discovery of Subgroups. PKDD 1997: 78-87

146Michael May

Fraunhofer IAISFraunhofer IAIS –– Knowledge DiscoveryKnowledge DiscoveryDr. Michael May

Contact:Michael May

Schloss Birlinghoven53754 Sankt Augustin

Tel: 02241 / 14 2731 / 2039eMail: michael.may@iais.fraunhofer.de

Thanks!

Tutorial on Geographic and Spatial Data Mining · Geographic Data Mining is a subset of Spatial...

Documents

GEOGRAPHIC INFORMATION YSTEM SPATIAL DATA …

Spatial Mining

Data Mining Spatial

Spatial Data and Geographic/Spatial Databases

Spatial Xl for Mining

Spatial Mining. Introduction Spatial Mining is a specialised domain of data mining whose goal is to find implicit knowledge in spatial data. Spatial data

The Geographic Advantage: GIS Solutions for Mining - · PDF fileThe Geographic Advantage GIS Solutions for Mining The business of mineral exploration and extraction is inherently spatial

Geographic information — Spatial referencing by geographic … · 2015. 8. 13. · Geographic information — Spatial referencing by geographic identifiers 1 Scope This International

Spatial Data Mining- Applications

Geographic Dimension in Data Mining - Amazon S3 Dimension in Data Mining ... • Data mining – Data warehousing and decision ... • Regionalization (spatial clustering)

Spatial Databases and Geographic Information Systems

Outline Spatial Databases Theme Map Geographic objects Modeling geographic data

Spatial Data Mining Satoru Hozumi CS 157B. Learning Objectives Understand the concept of Spatial Data Mining Understand the concept of Spatial Data Mining

Spatial Databases: Lecture 8 Spatial Data Mining

Spatial Information Retrieval. Spatial Data Mining + Knowledge Discovery Used for mining data in spatial databases with huge amounts of data Spatial data

Geographic data mining and knowledge

Spatial Databases and Geographic Information Systems€¦ · Spatial Databases and Geographic Information Systems ... element to model real world data in geographic information system

Mining Geographic Information in Text

Ramani Geosystems - Esri Ea D… · Mining ESRI GIS For Mining Seminar, 17th August, 2016, Dar Es Salaam, Tanzania. Spatial Data Solutions for Mining •Data that identifies the Geographic

Spatial Data Mining - imn.htwk-leipzig.de · entsprechendes Gegenstück im Spatial Data Mining → Spatial Clustering → Räumliche Klassifikation → Räumliche Assoziationsanalyse