1
Change Analysis in Spatial Datasets by Interestingness Comparison Vadeerat Rinsurongkawong, and Christoph F. Eick Department of Computer Science, University of Houston, Houston TX 77204-3010 CONTRIBUTIONS INTRODUCTION INTENSIONAL / EXTENSIONAL CLUSTERS FUTURE WORK Detecting changes in spatial datasets is important for many fields such as early warning systems that monitor environmental conditions, epidemiology, crime monitoring, and automatic surveillance. The goal of the presented research is the development of methodologies for change analysis in spatial datasets. A change analysis framework is presented that analyses how the interesting regions in one time frame differ from the interesting regions in the next time frame with respect to a user defined interestingness perspective. Advantages of the proposed framework include that we can detect various types of changes in data with continuous attributes and unknown object identity. CHANGE ANALYSIS FRAMEWORK We assume that two datasets in two different time frames, O old and O new , are given, and we are interested in finding what patterns emerged in O new . A change analysis framework is proposed that conducts change analysis as follows: 1) We run a region discovery algorithm, which is a clustering algorithm that employs a reward-based fitness function that captures a domain expert’s notion of interestingness to find interesting regions in O old and O new which are contiguous areas in the spatial space. 2) The relationships between interesting regions are analyzed. Two approaches to analyze correspondence between interesting regions are proposed. 3) A knowledge base of change predicates is The contributions of this paper include: 1) A framework for change analysis in spatial datasets by interestingness comparison 2) A set of change predicates that capture interrelationship between two clusterings 3) Concepts of intensional clusters and extensional clusters. Extensional clusters partition the input dataset into subsets, and return these subsets as clustering results. Intensional clusters are clustering models which represent functions that determine whether a given object belongs to a particular cluster or not. Polygons are used as models for spatial clusters. Cluste r Intensional Cluster Extensional Cluster CLUSTER CORRESPONDENCE Two approaches for analyzing relationships between two cluster models are introduced: 1) Direct Change Analysis for Intentional Clusters In this approach, intensional clusters of O old and O new are directly compared, mostly relying on polygon operations. 2) Indirect Change Analysis through Forward- Backward Analysis Based on Re-clustering The second approach creates cluster models for O old and O new and re-clusters the old data using the new model, and the new data using the old model, and then compares cluster extensions. DEMONSTRATION Results showing regions where variance of earthquake depth is high in O old data (left figure), and O new data (right figure) Disappearance regions in O old data (left figure) and novelty regions in O new data (right figure). We plan to extend our change analysis framework to cope with more complicated change predicates such as concept drift. We demonstrate how the framework can be used to analyze changes in areas where shallow earthquakes are in close proximity to deep earthquakes. CHANGE PREDICATES Earthquakes and their associated depth in O old data (left figure), and O new data (right figure); red dots are the shallowest earthquakes and blue dots are the deepest earthquakes. A set of basic change predicates that capture different relationships between two regions is introduced. These base predicates can be used to define more complex cluster relationships. Let r, r 1 ,…, r k be regions in O old and r’, r 1 ’,…, r’ k be regions in O new . Agreement(r,r’)= | r r’| / | r r’| Containment(r,r’)= | r r’| / | r | Novelty (r’) = (r’ —(r 1 r k )) Disappearance(r)= (r—(r’ 1 r’ k )) The operations are preformed on sets of objects in the case of the re-clustering approach and on polygons in the case of the

Change Analysis in Spatial Datasets by Interestingness Comparison Vadeerat Rinsurongkawong, and Christoph F. Eick Department of Computer Science, University

Embed Size (px)

Citation preview

Page 1: Change Analysis in Spatial Datasets by Interestingness Comparison Vadeerat Rinsurongkawong, and Christoph F. Eick Department of Computer Science, University

Change Analysis in Spatial Datasets by Interestingness ComparisonVadeerat Rinsurongkawong, and Christoph F. Eick

Department of Computer Science, University of Houston, Houston TX 77204-3010

CONTRIBUTIONS

INTRODUCTION INTENSIONAL / EXTENSIONAL CLUSTERS

FUTURE WORK

Detecting changes in spatial datasets is important for many fields such as early warning systems that monitor environmental conditions, epidemiology, crime monitoring, and automatic surveillance. The goal of the presented research is the development of methodologies for change analysis in spatial datasets. A change analysis framework is presented that analyses how the interesting regions in one time frame differ from the interesting regions in the next time frame with respect to a user defined interestingness perspective. Advantages of the proposed framework include that we can detect various types of changes in data with continuous attributes and unknown object identity.

CHANGE ANALYSIS FRAMEWORK

We assume that two datasets in two different time frames, Oold and Onew, are given, and we are interested in finding what patterns emerged in Onew. A change analysis framework is proposed that conducts change analysis as follows:1) We run a region discovery algorithm, which is a

clustering algorithm that employs a reward-based fitness function that captures a domain expert’s notion of interestingness to find interesting regions in Oold and Onew which are contiguous areas in the spatial space.

2) The relationships between interesting regions are analyzed. Two approaches to analyze correspondence between interesting regions are proposed.

3) A knowledge base of change predicates is provided that allows analyzing various aspects of change.

4) The change predicates are matched against the obtained clusters and change reports are generated.

The contributions of this paper include:1) A framework for change analysis in spatial

datasets by interestingness comparison2) A set of change predicates that capture

interrelationship between two clusterings3) Concepts of intensional clusters and

extensional clusters.

• Extensional clusters partition the input dataset into subsets, and return these subsets as clustering results.

• Intensional clusters are clustering models which represent functions that determine whether a given object belongs to a particular cluster or not. Polygons are used as models for spatial clusters.

Cluster

Intensional Cluster

Extensional Cluster

CLUSTER CORRESPONDENCETwo approaches for analyzing relationships between two cluster models are introduced:1) Direct Change Analysis for Intentional Clusters

In this approach, intensional clusters of Oold and Onew are directly compared, mostly relying on polygon operations.

2) Indirect Change Analysis through Forward-Backward Analysis Based on Re-clusteringThe second approach creates cluster models for Oold and Onew and re-clusters the old data using the new model, and the new data using the old model, and then compares cluster extensions.

DEMONSTRATION

Results showing regions where variance of earthquake depth is high in Oold data (left figure), and Onew data (right figure)

Disappearance regions in Oold data (left figure) and novelty regions in Onew data (right figure).

We plan to extend our change analysis framework to cope with more complicated change predicates such as concept drift.

We demonstrate how the framework can be used to analyze changes in areas where shallow earthquakes are in close proximity to deep earthquakes.

CHANGE PREDICATES

Earthquakes and their associated depth in Oold data (left figure), and Onew data (right figure); red dots are the shallowest earthquakes and blue dots are the deepest earthquakes.

A set of basic change predicates that capture different relationships between two regions is introduced. These base predicates can be used to define more complex cluster relationships. Let r, r1,…, rk be regions in Oold and r’, r1’,…, r’k be regions in Onew. • Agreement(r,r’)= | r r’| / | r r’|• Containment(r,r’)= | r r’| / | r |• Novelty (r’) = (r’ —(r1 … rk))• Disappearance(r)= (r—(r’1 … r’k))The operations are preformed on sets of objects in the case of the re-clustering approach and on polygons in the case of the direct approach