Upload
askroll
View
877
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
Bio-inspired computational techniques applied to the analysis and visualization of
spatio-temporal cluster dynamics
Miguel Arturo Barreto Sánz
Faculté des Hautes Etudes Commerciales (HEC)Institut des Systèmes d'information (ISI)
Outline● Introduction Data mining in spatio-temporal datasets
● Research plan Specific Goals Challenges in mining spatio-temporal datasets State of the art Approaches
● Preliminary results and discussion
1
2
Introduction
● Increasing number of complex data sets associated to geographical areas
● Routinely capture huge volumes of data describing several human or nature behaviors
For instance :
3
Information received from remote sensing systems, and environmental monitoring devices used in:
● Agriculture● Weather prediction● Cartography
Information sourcesIntroduction
4
These data sets are critical for decision support, but their value depends on the ability to extract useful information for studying and understanding the phenomena governing the data source.
Introduction
Data mining in spatio-temporal datasets
5
Currently
● Data mining in geospatial data take just the static view of geospatial phenomena.
However
● Geographic phenomena evolve over time ● Mining spatio-temporal data is related to the temporal dynamics of geospatial data = crucial to our understanding of geographic-based process and events.
Goal
● Describe the manner in which spatial patterns change through time
Introduction
Data mining in spatio-temporal datasets
6
Data mining in spatio-temporal datasets
Introduction
Some fields and applications include:
● Agro-ecology ● Environmental change ● Species distribution ● Disease propagation ● Urban dynamics ● Migration patterns
1
Introduction
Data mining in spatio-temporal datasets
Manage and understand changing spatial patterns of yields
● What are the variables that make that some regions produce more that the others ?
● Why are regions that maintain its production over time ?
7
8
The Normalized Difference Vegetation Index (NDVI) gives a measure of the vegetative cover on the land surface over
wide areas.
● What variables are related with the changes in the vegetative cover ?
Summer 1989
Summer 1990
Summer 1991
Summer 1992
Sumer 1993
Summer 1994
Summer 1996
Summer 1997
Summer 1998
Summer 1999
Summer 2000
Summer 2001
Introduction
Data mining in spatio-temporal datasets
Environmental Change (Satellite images)
9
It is very important to conduct research on data mining of spatio-temporal datasets.
● Develop methodologies ● Assist the knowledge extraction from spatio-temporal datasets ● Improving making decision processes.
Introduction
Data mining in spatio-temporal datasets
New methodologies
10
New methodologies to mining
spatio-temporal datasets
Visualization of spatio-temporal
cluster dynamics
To provide insights about the nature of cluster
change
To deal with the inherent characteristics of the spatio-temporal datasets
● Multivariate and Temporal Mapping● Visualization of Very Large Datasets● Changing spatial patterns
Introduction
Data mining in spatio-temporal datasets
New methodologies
For instance …
Introduction
Data mining in spatio-temporal datasets
New methodologies
Similarity of sugarcane growing environmental conditions (1999-2001) using Self-organizing maps
11
12
Introduction
Data mining in spatio-temporal datasets
New methodologies
● Which is the variable or variables that make that two clustersmerge in one. ● There are sites that change from one cluster to another year after year? ● Why that happens?.● It is possible to find recurrent patterns in the dynamics of the clusters?
13
Specific GoalsDevelopment of bio-inspired methodologies for the detection and tracking of changes in spatio-temporal clusters.
● Agro-ecological datasets will be used as a case study.
● This approach implies to find clusters of sites with similar characteristics in time and space.
Development of bio-inspired methodologies for the visualization of spatio-temporal cluster dynamics.
Research plan
14
Clusters of sites with similar characteristics in time and space
Research plan
Specific Goals
What crops or varieties are likely to perform well where and when.
Homologues places for Colombian coffee production. Brazil, Equator, East Africa, and New Guinea.
Soil
Climate
Genotype
15
Clusters of sites with similar characteristics in time and space
Research plan
Specific Goals
Harvest at different time of the same crop
16
Clusters of sites with similar characteristics in time and space
Research plan
Specific Goals
The COCH project
For commercial (mass production) crops (rice, corn) it is known the “when” and “where”
For native crops (guanabana, lulo) or special types of crops (coffee varieties) it is not the case.
DAPA (Diversification Agriculture Project Alliance)
When and what I must cultivate ?Market demand
17
Research plan
Challenges in mining spatio-temporal datasets
The special nature of spatio-temporal data poses several challenges to the knowledge extraction process.
For instance:
● Heterogeneity in sources of information and in scales of time and space
● Spatial autocorrelation● Boundaries in geospatial data● Temporal relationships between spatial objects
● Visualization of spatio-temporal cluster dynamics● Geographic space and feature space
18
Research plan
Challenges in mining spatio-temporal datasets
Conventional methods are not effective for handling mixture of data types and sources.
Heterogeneity in sources of information
19
Research plan
Challenges in mining spatio-temporal datasets
Heterogeneity in scales of time and space
Necessary to have methodologies to evaluate clusters at different scales in order to find “interesting” patterns between levels.
Improve the analysis of cluster structure at different scales, creating representations of the cluster facilitating the selection of clusters at different scales.
20
Research plan
Challenges in mining spatio-temporal datasets
Spatial autocorrelation
The spatial autocorrelation can be defined as the degree of relationship that exists between two or more spatial-data variables
21
Research plan
Challenges in mining spatio-temporal datasets
Boundaries in geospatial data
Algorithms for knowledge discovery in spatio-temporal databases have to consider the neighbors of the geo-referenced data.
For instance, part of the complexity of the problem lies in the fact that the boundaries of these neighbors are not hard, but rather soft boundaries.
Research plan
Challenges in mining spatio-temporal datasets
Similarity of sugarcane growing environmental conditions (1999-2001) using Self-organizing maps
The relationship between spatial objects can change over time.
This dynamic relationships can be observed for instance in the cluster changes over the time.
Temporal relationships between spatial objects
22
Research plan
Challenges in mining spatio-temporal datasets
Geographic space and feature space
Geographic space Feature space
Geographic space is concerned with surface features as the terrain we walk on.
Feature space visualization is concerned with the representation of similarities associated with geo-referenced sites in the geographic space
23
Research plan
Challenges in mining spatio-temporal datasets
Visualization of spatio-temporal cluster dynamics
● Visualization of the overall structure of the dataset,
● Exploration of correlations and relationships.
● Visualization of temporal patterns.
24
1 Km
1 Km
1 point
1 336,025 points just for Colombia
Research plan
State of the artMyra Spiliopoulou, et al.Monic: modeling and monitoring cluster transitions. In KDD ’06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining.
Daniel B. Neill et al. Detection of emerging space-time clusters. In KDD ’05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining.
Geoffrey M. Jacquez. Spatial Cluster Analysis (The Handbook of Geographic Information Science). John Wilson (University of Southern California), 2008
● Small databases● No agro-ecologic or environmental databases● Recorded in controlled conditions● Based on statistical models
25
Research plan
Used to analyze data when there is only a low level of knowledge about the dataset
● Unsupervised learning Heterogeneous data
● Hierarchical methods Heterogeneity in scales
Approaches
of time and space
26
Research plan
Approaches
Examples
Prototype
Examples
Prototype
● Data abstraction methods Heterogeneityin scales oftime and space
27
Research plan
Approaches
A Self-Organizing Map (SOM) applies a learning strategy used in neural structures like the cortex, and presents several advantages that we will exploit in our research in order to gain insights about the spatial autocorrelation present in the geographic zones.
The neighbourhood function hck(t) of a SOM, centred over the best matched neuron mc.
● Self-Organizing Map (SOM) Spatial autocorrelation
28
Research plan
Approaches
Similarity of sugarcane growing environmentalconditions (1999-2005)using Self-organizing
maps
The clusters found in the feature space in many cases are not the same as those found in geographic space.
Represent clusters of a multidimensional space: map multidimensional data onto a two-dimensional lattice of cells.
● Self-Organizing Map (SOM) Geographic space and feature space
29
Research plan
Approaches
● Self-Organizing Map (SOM) Visualization of spatio temporal cluster dynamics
Visualization of the overall structure of the dataset, it is clustering, patterns (similarities) and irregularities.
Exploration of correlations and relationships. This is primarily based on component plane displays in multiple views.
Visualization of temporal patterns. Examples are ordered component displays and trajectories.30
Partial Correlation
Research plan
Approaches
In many applications crisp partitions are not the optimal representation of clusters.
With the purpose of representing degrees of membership, is a feature that could be added to the model.
● Fuzzy logic Boundaries in geospatial data
31
Research plan
Approaches
To deal with non stationary-relationships implies to find relationships which varies through time and space.
This challenge involves the creation of methodologies capable to adapt their models in order to reveal the dynamics of the clusters and represent their characteristics in the most accurate manner.
Growing hierarchical Self-Organizing Structures could be used as a base for hybrid models in order to detect, reveal and analyze spatio-temporal cluster dynamics.
● Non-stationarity relationships between spatial objects Growing hierarchical Self-Organizing Structures
32
Research plan
Approaches I propose ...
An unsupervised model based on self-organization which allows data abstraction, hierarchical organization of the clusters, and automatic detection of interesting changes in the dynamics of spatio-temporal clusters.
Some characteristics of the model must be:
● Adapt its structure.
● Changes presented in its structure will reveal cluster dynamics as merging, emergence, mutation, and parallel dynamics.
33
Research plan
Approaches I propose ...
● The hierarchical structure will permit to tackle the problem related to the scale effect (navigation of the clustering structure in different levels).
● The model will work with fuzzy memberships to avoid the problem of boundaries in geospatial data.
● The unsupervised methodology will help to find relationships that can be hidden in very large and heterogeneous datasets (Heterogeneity in sources of information).
34
Preliminary results and discussion
[1] Miguel Barreto-Sanz. and Andrés Pérez-Uribe. Classification of similar productivity zones in the sugar cane culture using clustering of som component planes based on the som distancematrix. In The 6th International Workshop on Self-Organizing Maps (WSOM), 2007.
[2] Miguel Barreto-Sanz. and Andrés Pérez-Uribe. Improving the correlation hunting in a large quantity of som component planes. In ICANN 2007. Proceedings of the 1th international conference on Artificial Neural Networks.
[3] Miguel Barreto-Sanz and Andrés Pérez-Uribe. Tree-structured self-organizing map component planes as a visualization tool for data exploration in agro-ecological modeling. In in Proc. of the 6th European Conf. on Ecological Modelling, Trieste, Italy, 2007
35
Preliminary results and discussion
[4] Miguel Barreto-Sanz, Andrés Pérez-Uribe, Carlos-Andres Peña-Reyes, and Marco Tomassini. Fuzzy growing hierarchical self organizing networks. In ICANN 2008: Proceedings of the 18th international conference on Artificial Neural Networks.
[5] Miguel Barreto-Sanz, Andrés Pérez-Uribe, Carlos-Andres Peña-Reyes, and Marco Tomassini. Tuning Parameters in the Fuzzy Growing Hierarchical Self-Organizing Networks. To appear in: Studies in Computational Intelligence, CONSTRUCTIVE NEURAL NETWORKS Springer, 2009.
36
Thanks for new ideas and directions to explore!