25
Journal of Visual Languages & Computing Journal of Visual Languages and Computing 18 (2007) 255–279 Exploratory spatio-temporal data mining and visualization P. Compieta a,b , S. Di Martino c , M. Bertolotto a , F. Ferrucci c , T. Kechadi a, a University College Dublin, Dublin, Ireland b Universita` degli Studi di Bologna, Bologna, Italy c Universita` degli Studi di Salerno – DMI, Fisciano (SA), Italy Abstract Spatio-temporal data sets are often very large and difficult to analyze and display. Since they are fundamental for decision support in many application contexts, recently a lot of interest has arisen toward data-mining techniques to filter out relevant subsets of very large data repositories as well as visualization tools to effectively display the results. In this paper we propose a data-mining system to deal with very large spatio-temporal data sets. Within this system, new techniques have been developed to efficiently support the data-mining process, address the spatial and temporal dimensions of the data set, and visualize and interpret results. In particular, two complementary 3D visualization environments have been implemented. One exploits Google Earth to display the mining outcomes combined with a map and other geographical layers, while the other is a Java3D- based tool for providing advanced interactions with the data set in a non-geo-referenced space, such as displaying association rules and variable distributions. r 2007 Elsevier Ltd. All rights reserved. Keywords: Data mining; Spatio-temporal data; Exploratory visualization ARTICLE IN PRESS www.elsevier.com/locate/jvlc 1045-926X/$ - see front matter r 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.jvlc.2007.02.006 Corresponding author. Tel.: +353 1 716 2478; fax: +353 1 269 7262. E-mail addresses: [email protected] (P. Compieta), [email protected] (S. Di Martino), [email protected] (M. Bertolotto), [email protected] (F. Ferrucci), [email protected] (T. Kechadi).

Exploratory spatio-temporal data mining and visualization

Embed Size (px)

Citation preview

Page 1: Exploratory spatio-temporal data mining and visualization

ARTICLE IN PRESS

Journal ofVisual Languages & ComputingJournal of Visual Languages and Computing

18 (2007) 255–279

1045-926X/$

doi:10.1016/j

�CorrespoE-mail ad

michela.berto

www.elsevier.com/locate/jvlc

Exploratory spatio-temporaldata mining and visualization

P. Compietaa,b, S. Di Martinoc, M. Bertolottoa,F. Ferruccic, T. Kechadia,�

aUniversity College Dublin, Dublin, IrelandbUniversita degli Studi di Bologna, Bologna, Italy

cUniversita degli Studi di Salerno – DMI, Fisciano (SA), Italy

Abstract

Spatio-temporal data sets are often very large and difficult to analyze and display. Since they are

fundamental for decision support in many application contexts, recently a lot of interest has arisen

toward data-mining techniques to filter out relevant subsets of very large data repositories as well as

visualization tools to effectively display the results. In this paper we propose a data-mining system to

deal with very large spatio-temporal data sets. Within this system, new techniques have been

developed to efficiently support the data-mining process, address the spatial and temporal

dimensions of the data set, and visualize and interpret results. In particular, two complementary

3D visualization environments have been implemented. One exploits Google Earth to display the

mining outcomes combined with a map and other geographical layers, while the other is a Java3D-

based tool for providing advanced interactions with the data set in a non-geo-referenced space, such

as displaying association rules and variable distributions.

r 2007 Elsevier Ltd. All rights reserved.

Keywords: Data mining; Spatio-temporal data; Exploratory visualization

- see front matter r 2007 Elsevier Ltd. All rights reserved.

.jvlc.2007.02.006

nding author. Tel.: +353 1 716 2478; fax: +353 1 269 7262.

dresses: [email protected] (P. Compieta), [email protected] (S. Di Martino),

[email protected] (M. Bertolotto), [email protected] (F. Ferrucci), [email protected] (T. Kechadi).

Page 2: Exploratory spatio-temporal data mining and visualization

ARTICLE IN PRESSP. Compieta et al. / Journal of Visual Languages and Computing 18 (2007) 255–279256

1. Introduction

During the last decade, our ability to collect and store data has far outpaced our abilityto process, analyze and exploit it. Many organizations have begun to routinely capturehuge volumes of historical data describing their operations, products and customers. Atthe same time, scientists and engineers in many fields have been capturing increasinglycomplex experimental data sets, such as terabytes of data received daily from space-borneinstruments, high spatial, temporal and spectral-resolution remote sensing systems, andother environmental monitoring devices [1]. For instance, the coverage and volume ofdigital geographic data sets are extensive and steadily growing. Some researches estimatethat about 80% of the data stored in corporate databases integrate spatial information[2], leading to huge amounts of geo-referenced information that need to be analyzedand processed. These data sets are often critical for decision support, but their valuedepends on the ability to extract useful information for studying and understanding thephenomena governing the data source. Therefore, the need for efficient and effectivetechniques for mining and analyzing spatio-temporal data sets has recently emerged as aresearch priority [3].Data-mining techniques have been proven to be of significant value for spatio-temporal

applications [4]. It is a user-centric, interactive process, where data-mining experts anddomain experts work closely together to gain insight on a given problem. In particular,spatio-temporal Data Mining is an emerging research area, encompassing a set ofexploratory, computational and interactive approaches for analyzing very large spatialand spatio-temporal data sets. Several open issues have been identified ranging fromthe definition of mining techniques capable of dealing with spatial–temporal infor-mation to the development of effective methods for interpreting and presenting thefinal results.Visualization techniques are widely recognized to be powerful in this domain [5], since

they take advantage of human abilities to perceive visual patterns and to interpret them[4,6,7]. However, it is widely recognized that spatial visualization features provided in theexisting geographical applications are not adequate for decision-support systems whenused alone. Hence, alternative solutions have to be defined [3]. Indeed, new solutionsshould not only include a static graphical view of the results produced during the data-mining process, but also the possibility to dynamically and interactively obtain differentspatial and temporal views as well as interact in different ways with them. For example, thefunctionality of dynamically changing some of the parameters values and switchingbetween different views for fast comparisons should be provided. This can only help indiscovering details and patterns that might remain hidden otherwise. As a result, theproblems of how to visualize the spatio-temporal multidimensional data set [8] and how todefine effective visual interfaces for viewing and manipulating the geometrical componentsof the spatial data [9] are the challenges that need to be tackled.To address these issues, we have developed a system for exploratory spatio-temporal

data mining. The aim of this system is, on one hand, to enable data-mining tools to providesome form of localization in the data being analyzed, and, on the other hand, tointeractively visualize in 3D the outcome of the mining process, thus leading to greatereffectiveness and significance of the results. To achieve these goals, the system includes adata-mining engine that can integrate different data-mining algorithms (to work withspecific types of data sets) and two complementary 3D visualization tools.

Page 3: Exploratory spatio-temporal data mining and visualization

ARTICLE IN PRESSP. Compieta et al. / Journal of Visual Languages and Computing 18 (2007) 255–279 257

One of the visualization tools exploits Google Earth [10] to render in 3D the miningoutcomes, extended over a geo-referenced satellite image enhanced by additionalinformative layers. This tool aims at facilitating domain experts and is particularly usefulto highlight spatial relationships among the data sets and their spatial context, in terms ofgeographical layers, such as cities, roads, mountain chains, seashores etc., as well as toshow how the phenomenon evolves during the time. The other visualization tool exploitsJava3D [11] to provide more advanced user interaction with the mining results in a non-geo-referenced space. The set of encompassed features makes it more oriented to data-mining experts. Indeed, its key feature is the possibility to visualize both the data satisfyingspecific mining rules and the ‘‘shape’’ of the rules extracted. This allows discovering specificrelationships between the shape of the initial data set and the shape of specific rules, i.e. thecharacteristics of the initial data set and all uncovered laws controlling the observed event.Standard visualization tools for geo-spatial data do not provide this functionality.

The proposed system has been developed and tested against a large real-world data set(Hurricane Isabel, which struck the US east coast in September 2003, see [12]), trying tosolve the critical issue of uncovering characteristics and behavior of a destructive naturalphenomenon. Hurricane Isabel data set is freely available and describes the maincharacteristics, sampled each hour along the two days of the highest intensity of the storm.The resulting data set is huge, about 62.5GB, and contains more than 25 millions real-valued points in each timestep, and thus represents a significant case study.

The remainder of the paper is structured as follows. Section 2 describes the maincomponents of the proposed system. Section 3 provides details of the data-mining enginewhile Section 4 presents the two visual environments. Section 5 is dedicated to the selectedcase study and to a discussion of experimental results. Finally in Section 6 we present someconclusions and ideas for future work.

2. System architecture

This section describes the architecture of the system. We first discuss the main conceptsof the data-mining process, and then introduce the main components of the system, namelythe mining engine and the visualization tools.

2.1. The spatio-temporal data-mining process

The data-mining process usually consists of three phases, or steps: (1) pre-processing ordata preparation; (2) modeling and validation; and (3) post-processing or deployment.During the first phase, the data may need some cleaning and transformation according tosome constraints imposed by some tools, algorithms, or users. One has to make sure thatthe data are free of noise and some transformations are needed for visualizing very largedata sets. The second phase consists of choosing or building a model that better reflects theapplication behavior. In other words, once a model is chosen or developed, it should beevaluated in terms of its efficiency and accuracy of its predictive results. Finally, the thirdstep consists of using the model, evaluated and validated in the second phase, to effectivelystudy the application behavior. Usually, the model output requires some ‘‘post-processing’’ in order to exploit it. This step can take all the benefits of data visualization,since interactivity and user expertise are very important in the final decision-making anddata interpretation.

Page 4: Exploratory spatio-temporal data mining and visualization

ARTICLE IN PRESSP. Compieta et al. / Journal of Visual Languages and Computing 18 (2007) 255–279258

The mining process for spatial data is more complex than for relational data in terms ofboth the mining efficiency and the complexity of possible patterns that can be extractedfrom spatial data sets [13–15]. The reason is that the attributes of some patterns neighborsmay have significant influence on the given pattern and then should also be considered.Therefore, new techniques are required to efficiently and effectively mine spatial data sets.Especially, in spatial data mining, the third phase is so important that some researchersincorporated most of its processes into phase two, such as automatic and interactivevisualization of data, and called it ‘‘interactive data mining’’ (IDM) [16]. IDM combinesboth automatic and visual data mining, and received much attention from users as it offershigher degree of satisfaction and confidence [17].

2.2. Exploratory spatio-temporal data-mining system

The proposed system for mining large spatio-temporal data sets describes the behaviorof some natural phenomena, which have been monitored and recorded at several timeinstants. The system is mainly intended to deal with data sets characterized by thematicproperties, expressed through some values of attributes that change over time [7].Our system relies on a standard three-tier architecture, including a data store at the back

end, an application server, and two visualization components at the front end (see Fig. 1).The application server runs all application programs that perform the mining tasks.The mining engine produces an output model that contains structured results. Dependingon a specific adopted mining algorithm, these results may need to be manipulated indifferent ways.Since several different application domains can be considered, the application server

must include domain-specific wrappers that transform raw data into the input formatrequired by the mining engine (see Fig. 2). These wrappers implement the data set typemodels described in Section 3.3.

Fig. 1. System architecture.

Page 5: Exploratory spatio-temporal data mining and visualization

ARTICLE IN PRESS

Fig. 2. Architecture of the spatio-temporal data-mining engine.

P. Compieta et al. / Journal of Visual Languages and Computing 18 (2007) 255–279 259

In order to interpret the output of the mining process, we envision feeding the results ofthe mining process to different visualization tools, possibly providing complementaryinteracting functionality. In this first implementation we have developed two alternativevisualization tools to support exploratory visual data interpretation. The first one(described in Section 4.1) embeds the Google Earth application while the second one is aJava3D-based application. Both applications are able to display the output of the miningprocess in a 3D virtual environment, allowing the user to freely change their viewingperspective. The main goal of these applications is to enhance the overall knowledgediscovery process, allowing decision makers and knowledge engineers to better understandand discuss the logic behind the models, by supporting analytical reasoning.

These tools allow to answer different but complementary requirements posed by domainand mining experts. While the Google Earth-based tool focuses on highlighting the spatialrelationships between the data set and the real-world geographical entities involved in thephenomenon, the Java3D-based tool mainly concentrates on the exploratory analysis ofthe data, by analyzing the internal structure of the data set, their inherent internalrelationships, and the patterns among data inferred by the mining algorithm. These twotasks are quite different, in terms of both the displayed information and the functionalityrequired to explore and interact with the data. Thus, the combination of these two toolsprovides the user with very appropriate and effective means of studying the problem, whileavoiding visual/cognitive overload (due to unnecessary rendered information, clutteringthe display) as well as limitations in exploratory analysis. Moreover, the system allowsvalidating both the spatio-temporal mining process and the discovered patterns.

From a data visualization point of view, the two tools follow in many respects the sameapproach. Geographical data are arranged in many different layers—one for each theme(variable). Rules are intended substantially as sets of items, each identifying a range ofvalues for a specific variable; hence displaying an item means painting an isocloud of points(a 3D scatter plot) with the same value. Moreover, both tools allow some degree ofcustomization in the presentation of the data (for instance, the colors used to represent aspecific attribute).

3. The data-mining process

In this section we describe our data-mining approach that deals with spatio-temporaldata sets. We start with a review of the current state-of-the art in this field, and then wepresent our solution.

Page 6: Exploratory spatio-temporal data mining and visualization

ARTICLE IN PRESSP. Compieta et al. / Journal of Visual Languages and Computing 18 (2007) 255–279260

3.1. Related work on spatial data mining

Numerous research projects on spatial data mining have been conducted in the last twodecades [7]. Some attentions have been dedicated to the application of existing as well asthe development of new mechanisms to extract relevant information from large datarepositories. However, due to the huge volume and diverse nature of this kind of data,traditional techniques such as statistical methods have high computational burdens andseem often inadequate to elicit complex spatial and temporal relationships among data.Most of the approaches used in current research projects are based on clustering and

association rules [18,19]. Clustering techniques are used to group objects, based on theirspecial parameters or some notion of distance or similarity measure. In traditionalclustering the similarity measure often depends on the distance between objects or theirland surface types. The most popular clustering technique is k-means [20]. Several of itsvariants were also developed to improve the quality of the clusters such as k-harmonicmeans [21], spherical k-means [20], X-means [22], G-means [23], etc. Moreover, there aremany other clustering techniques based on different principles including DENCLUE [24],BIRCH [25], and Chameleon [26]. All these algorithms deal only with spatial correlationsand cannot be directly used for discovering temporal patterns simultaneously. Thesetechniques are very computationally expensive and usually trapped into local minima, asthey tend to concentrate on local features [27].Association rules have also been used successfully on special data sets. They seek to

discover associations among patterns encoded within the data sets. The main idea is todesign spatial association rules that not only can find local correlations between patterns,but also global ones [28]. Spatial association rules constitute an improvement togeneralization-based spatial data-mining methods [29], as they cannot discover rulesreflecting spatial pattern structures. These techniques have the advantages of being verypowerful, using simple computations, and dealing with different types of attributessimultaneously [30]. However they have exponential growth as the problem size increases,depend highly on the model used to represent and determine the items, and have difficultiesin identifying items that only occur rarely [31,32].Most efforts have been spent in trying to adapt, modify or improve ‘conventional’ techniques,

relying on a solid knowledge discovery in database (KDD) experience to design new suitablemining models. As detailed in [30], this approach usually materializes in either embeddingtemporal awareness in spatial systems or accommodating space into temporal data-miningsystem. Similarly, [23,33] study and extend existing OLAP systems with additional componentsand/or layers to treat geo-referenced data. Despite some good results, a flaw of this method is thelack of a clear ‘discernment’ on how to consider space and time: it simply manages to handlethem without really exploiting their characteristics during the analysis.Some good attempts have been made in proposing a better—more meaningful—data

format to highlight spatio-temporal relationships prior to elaboration; this can be achievedeither by pre-processing the database [4] or by imposing a level of meta-data to properlyaccess information within the whole data set [34,35].

3.2. The proposed approach

In this paper we propose a new approach for spatio-temporal data mining, whoseconceptual schema is depicted in Fig. 3. The approach consists of two main components;

Page 7: Exploratory spatio-temporal data mining and visualization

ARTICLE IN PRESS

Fig. 3. A schematic view of the proposed approach for spatial data mining.

P. Compieta et al. / Journal of Visual Languages and Computing 18 (2007) 255–279 261

localiser and miner. The localiser deals with the data attributes and especially with spatialand temporal dimensions. The miner process the data based on the spatio-temporalrelationships provided by the localiser. By differentiating the two tasks, the computationsare simple and quick as they are done locally. The output of the miner (association rules) isalso fed to the localiser for further improvement. Unlike conventional approaches, inwhich the interactivity is inexistent or reduced to a minimum (very few simple operations),our approach is interactive as the two processes can run concurrently. This approach alsodiffers from IDM in the sense that the processes of mining/analyzing the data andvisualization are quite separate, as to facilitate the testing and evaluation of differentalgorithms and techniques involved in each individual phase.

In the following we will focus on the techniques used in each phase, in order to modelthe Hurricane Isabel case study.

3.3. Spatio-temporal data set model

It was already mentioned above that spatial data sets are more complex thanconventional data. This complexity is not only in processing and interpreting the data butis also present at the data-mining process inputs. Spatial data is usually characterized bytwo different types of attributes: spatial and non-spatial attributes. The former identifiesthe spatial locations of spatial items. These include 3D space coordinates, item shape,temporal, geometry, etc. The latter is usually the same as in conventional data sets, such asitem name, item key, type, rate, size, etc. The main difference between these two types ofattributes is that the relationships between spatial patterns/items are often implicit, whilethey are usually explicit in non-spatial objects.

Usually the pre-processing phase depends directly on the technique used to mine andanalyze the data. For instance, association rules are designed to work with categoricaldata. Data preparation, in this case, consists of discretizing numeric data into ordinalcategories. This conversion has direct impact on the results of the rule mining. Somemethods of how to optimize the categories have been developed and, mainly, based onheuristic methods and clustering analysis [30,32,36].

In this study, we developed a technique to categorize data based on the samplingtechnique described in [30]. The main difference between them resides in their goals. Ourtechnique, called ‘‘HurricaneNarrower’’, is designed to properly categorize the variables or

Page 8: Exploratory spatio-temporal data mining and visualization

ARTICLE IN PRESS

Fig. 4. Example of categorization for the two variables W and QGRAUP. (Inside an interval) dark color ¼ many

points are in that range of values; light color ¼ few points.

P. Compieta et al. / Journal of Visual Languages and Computing 18 (2007) 255–279262

attribute values, while the sampling technique developed in [30], its main goal is to reducethe database activity by analyzing a randomly chosen sample and then generalizing theresult to the whole database. Our technique also takes into account the fact that manyvariables never cover all the range of values allowed (e.g. after splitting QCLOUD’sdomain into 100 intervals, we observe that 98% of values fall in the first interval, i.e.between 0 and 0.0000332—see Fig. 4 for an example of this problem and a general idea ofhow the technique works).The model used here consists of mapping the spatial data sets onto a virtual partitioned

space. This can be seen as a layer in which original data are aggregated into virtual points(partitions) representing the minimal spatial unit that can be occupied by a spatio-temporal entity. Each virtual point is identified by a set of attributes including coordinates,size, neighborhood, etc. For instance, traditional geographical databases have two or threedimensions, while in spatio-temporal data sets the number of dimensions can range fromtwo (one spatial and one temporal dimension), to N (time, three spatial dimensions, n

virtual dimensions). The points are disjoint; therefore, any shape used to implement avirtual point should satisfy disjunctive and complement properties. This model will allowus to hide all the problems of heterogeneity and unify the concept of items (virtual points)to the majority of spatio-temporal data sets.

3.4. Spatial association rules

In the proposed system we focus our attention in developing a technique based onassociation rules to discover relationships between spatial patterns. A spatial associationrule is of the form ‘‘A - B (s%, c%)’’, where the pattern A is called antecedent and B

Page 9: Exploratory spatio-temporal data mining and visualization

ARTICLE IN PRESSP. Compieta et al. / Journal of Visual Languages and Computing 18 (2007) 255–279 263

consequent, and the percentages s and c are called the support and the confidence of the rule[4,20]. The problem of discovering association rules consists of identifying all rules, withinthe data set, satisfying minimum support s and confidence c. This usually requires asolution to the following two sub-problems: (1) find frequent (large) spatial patterns; (2)extract strong spatial association rules. In the first problem the rules should satisfy aminimum support (support4s) and in the second a spatial association is said to be strongif it satisfies a minimum confidence (confidence4c).

To mine spatial association rules we developed a technique based on Apriori algorithm[37], which is based on the rule: ‘‘any subset of a frequent itemset must be frequent’’. Theassociation rule extraction is based on the key concept of spatial itemset. According to themodel defined above for spatio-temporal data sets, each itemset is associated to a set ofvirtual points. We say that a virtual point supports an itemset if and only if the itemset isfrequent in that point. That itemset is called spatial itemset. Note that virtual pointssupporting the rule (or itemset) can cover more than one timestep, thus extending themodel to include patterns with well-defined time interval. As the traditional Apriorialgorithm has a very high computational complexity, it is not suitable for very large datasets. The idea is to reduce the size of the input data by presenting to the algorithm onlywith higher spatio-temporal relationship; namely virtual points. Therefore, by exploitingthe features of spatio-temporal data sets and by reducing the size of candidate generationperformed by the localiser, the adapted Apriori algorithm is efficient. The output of thealgorithm consists of frequent itemsets and strong association rules. A basic version of thealgorithm is given in Fig. 5.

Basically, two itemsets can be united if they share all the items except one. A furthercontrol is needed to verify that the intersection of the sets of virtual points referred by thetwo itemsets is not empty. Indeed, such intersection will become the supporting area (set ofvirtual points) for the new itemset. This is to ensure that each new itemset has a supportingarea, and an adequate number of virtual points. An example of application of this devisedtechnique is provided in Section 5, on a real spatio-temporal data set.

Fig. 5. The mining algorithm.

Page 10: Exploratory spatio-temporal data mining and visualization

ARTICLE IN PRESSP. Compieta et al. / Journal of Visual Languages and Computing 18 (2007) 255–279264

4. Visual techniques for advanced spatial analysis

Visual data mining refers to methods, approaches and tools for the exploration of largedata sets by allowing users to directly interact with visual representations of data anddynamically modify parameters to see how they affect the visualized data. This is usuallyachieved by means of techniques from information visualization, visual perception, visualmetaphors, diagrammatic reasoning, and 3D computer graphics [4], without requiring thedecision makers to have knowledge on technicalities.Visual data-mining techniques have proven to be very valuable in exploratory data

analysis and they also have a high potential for mining large databases [6], since theyshift the load from the user’s cognitive system to the perceptual system [4]. As aresult, visual data mining is a crucial area in explorative data mining, aimed atenhancing the effectiveness of the overall mining process, by supporting analyticalreasoning.In this section we provide an in-depth description of the two complementary

visualization applications we developed in order to enable our system with variousexploratory visual capabilities, meant for the different actors involved in the interactivemining process. Section 4.1 discusses the Google Earth-based solution, while in Section 4.2the Java3D-based tool is described.

4.1. The Google Earth-based tool

The first tool has been meant for domain experts, i.e. users that study the specificphenomenon but are not (necessarily) experts in data mining. Indeed, it is aimed atproviding an interactive environment, where he/she can get insight on relationshipsbetween the mining outcomes and nearby geographical entities. To this aim, the toolproposes some widgets to carefully select the information to deal with, which will berendered in 3D over a map and other layers provided by Google Earth.Google Earth (shortly GE) is a virtual globe, currently freely available for personal use

on PC running on Windows and Mac OS, while the Linux version is expected shortly. Forcommercial and professional use, many purchasing options are available, ranging frombasic licenses to enterprise services. The original project was developed by Keyhole, whichwas bought by Google in 2004.Google Earth combines satellite raster imagery, with vector maps and layers, in a single

and integrated tool, which allows users to interactively fly in 3D from outer space to streetlevel views. Most places of the world are available at (at least) 1 km of resolution, whilemany large cities are available at high enough resolution to see individual buildings,houses, and even cars. A very wide set of geographical features (streets, borders, riversairports, etc.), as well as commercial points of interest (restaurants, bars, lodging, shoppingmalls, fuel stations, etc.), can be overlaid onto the map. A key characteristic of this tool isthe fact that the geographical data are not stored on client computers, as they arestreamed, upon request, from Google’s huge server infrastructure, ensuring fastconnections and almost 100% up time. Moreover, this guarantees that data are alwaysup-to-date. Another remarkable feature implemented by GE is a 3D Skyline for manyAmerican cities, exploiting data provided by ITSpatial. The application uses data fromNASA databases to render 3D terrain models, thus providing also Digital ElevationModel features.

Page 11: Exploratory spatio-temporal data mining and visualization

ARTICLE IN PRESSP. Compieta et al. / Journal of Visual Languages and Computing 18 (2007) 255–279 265

We have exploited the 3D capabilities provided by Google Earth, together with theupdated geographical information layers available on the web, to combine data set themesand variables with real-world infrastructures and geographic features.

This application turns out to be very flexible, being able to deal with a large variety ofspatio-temporal phenomena, ranging from worldwide (e.g. weather, pollution, epidemicdiffusions, etc.) to local ones (e.g. local health, traffic, economics, etc.). The tool wedeveloped embeds GE and presents the same ease of use, resulting very suitable fordomain-expert users.

For data presentation we exploit the ‘‘focusing’’ visualization technique [7], where theuser can freely move his/her perspective view in a 3D environment, to analyze the data set.Indeed, data are shown in a 3D animated perspective canvas that can be rotated, zoomedand moved. A left panel allows the decision maker to query the data set, in order todefine the specific (set of) themes/rules to render, and to customize the way the informationis depicted. A bottom panel provides a widget to see how the phenomenon evolved overthe time.

Each considered variable is rendered as an isocloud with a different color. Theapplication works accordingly to the principle that all the data that do not match thesequery parameters, set by the user, are removed from the visualization canvas. This filteringis immediately applied, thus providing direct manipulation features.

Moreover, each rendered point can be made an hyper link, in order to answer query ofthe form ‘‘When+Where - What’’, which is a typical analysis task in exploratory spatio-temporal data mining [25].

The resulting user interface, shown in Fig. 6, is composed of three main panels:

View panel: This is the GE application, used to show in 3D, at an arbitrary zoom level,the data. It allows for six degree of freedom (DoF), achieved through combination ofmouse clicks and movements, or through a lower panel.

Fig. 6. Schema of the user interface for the GE-based visualization tool.

Page 12: Exploratory spatio-temporal data mining and visualization

ARTICLE IN PRESSP. Compieta et al. / Journal of Visual Languages and Computing 18 (2007) 255–279266

Data panel: This is a vertical panel, located to the left of the window, which allows toquery and filter data, to get the specific information that the user wants to view.Through a Tabbed control, the user can choose if dealing with the attributes of thewhole data set (the first tab), or with the association rules inferred by the mining engine(the second tab).J 4In the former case, the system lists all themes contained in the data set (see Fig. 7

(Top)), and for each of them, the user can indicate if it should be rendered, and selectsboth the specific value to render as an isocloud, both the color to be used to depict thedata. For instance, in Fig. 7 (Top) the isocloud of hurricane’s points having‘‘Pressure’’ equals to ‘‘3’’ has been rendered in the ‘‘Cyan’’ color. Moreover, toovercome some visualization problems (basically occlusions) that might arise whendealing with 3D representations, the system displays only a ‘‘slice’’ of the data set,based on the altitude. Yet, the user can select to view all the points satisfying criteria,or clip the representation to the set of points falling into a specific range of altitude.

J In the latter case, the system lists the set of discovered association rules. As soon asthe user selects one of them, the interface presents the antecedent and the consequentvariables for the considered rule, in two different visual controls. Again, the userspecifies which of them should be rendered as isoclouds, together with the respectivecolors to be used to depict the data. For instance, in Fig. 7 (Bottom), the associationrule ‘‘Cloud ¼ 0, Precipitation ¼ 0, Pressure ¼ 3- QVapor ¼ 4, Temperature ¼ 2’’is being displayed. Then, the Cloud and QVapor variables are selected to berendered, respectively, in ‘‘Cyan’’ and ‘‘Light Green’’ colors.

Dimensional panel: This panel allows the user to move in four dimensions, namely the3D permitted by GE (by exploiting six DoF), and the time dimension, through a slidingbar. To this aim, the horizontal panel, located at the bottom of the window, realizes aunique control panel/set of commands to follow the data painted on screen.

4.1.1. Technical aspects

The development of an environment exploiting Google Earth technologies to rendermining outcomes posed two main challenges:

1.

How to arrange the information of the data set and/or the rules in a way that it could bedisplayed by Google Earth, and

2.

How to improve the Google Earth user interface, to provide the tools to carry out theexploratory spatio-temporal data mining and visualization.

To address the first issue, we exploited the ad hoc language provided by GE, namedkeyhole markup language (or KML) [38], which is an XML grammar and file formatsuited to model one or more spatial features to be displayed in GE. For instance, throughthis language, a user can assign icons and labels to a location on the planet surface, specifycamera positions to define views, and so on. KML supports some basic geometrical shapes,whose appearance can be manipulated by defining coordinates (Longitude, Latitude, andAltitude), or extrusions, as well as they can be grouped together into collections, to createand manage complex 3D objects consisting of numerous shapes. These files are thenprocessed by the Google Earth client in a way similar to how HTML files are processed byweb browsers. Consequently, GE can be viewed as a browser of KML files. A lot of

Page 13: Exploratory spatio-temporal data mining and visualization

ARTICLE IN PRESS

Fig. 7. Visualization of data set themes (Top) and rules (Bottom). Notice the different widgets provided in the left

panel.

P. Compieta et al. / Journal of Visual Languages and Computing 18 (2007) 255–279 267

Page 14: Exploratory spatio-temporal data mining and visualization

ARTICLE IN PRESSP. Compieta et al. / Journal of Visual Languages and Computing 18 (2007) 255–279268

documentation and tutorials on KML are available over the web. By exploiting thislanguage, the user community is defining a wide set of other points of interest, which canbe seamlessly accessed and integrated over the web. In our application, we designed andimplemented some routines for an on-the-fly generation of KML files, based on user inputspecified in the data panel. Each generated KML file contains the coordinates of eachconsidered point of the data (or item) set Knowing the bounding box of the whole data set,these coordinates are calculated based on the position of the specific point in the data set.The generation of a file containing more than 100,000 points is almost instantaneous on anotebook based on a Pentium M 2.0GHz, with 768MB of RAM.In relation to the second issue, there are two main ways to programmatically interact

with GE. The former requires the definition of ad hoc KML files, specifying the startingpoint of view. This approach is straightforward, but does not provide an effectivemanagement of the user interactions. The alternative solution is to use the set of APIprovided by GE. Indeed, once such an application is installed, a new COM component isavailable in the Windows system, namely the KEYHOLELib. Once this is imported in aprogramming environment as a reference library, a new namespace is available, whichexports two main classes, i.e. the KHInterfaceClass and the KHViewInfoClass. They grant,respectively, a full control on the user interface and the active point of view in GE. Inparticular, an instance of the KHInterfaceClass permits to start-up the application, to loada KML file, to enable/disable other active layers, to resize the window, and to takescreenshots. Moreover to get/set the current point of view, an instance of theKHViewInfoClass allows to set its coordinates, azimuth, tilt and zoom.To embed the GE windows within our C# application, we wrapped in .NET the

FindWindow and SetParent Win32 systems calls, available in the User32.dll, suited to getthe handle of an arbitrary window, and to control it, respectively.

4.2. The Java3D-based tool

The Java3D custom application we have developed is aimed at providing a 3D renderingof and interaction with the association rules produced by the mining algorithm. Basically,it is a mining expert-oriented tool, since it offers many features which are specific for theexploratory data-mining domain. For instance, it provides some widgets on the interface todirectly choose different values, parameters or Association Rules to display.The visualization tool we have developed is apt to present in 3D the Association Rules

identified by the mining engine. It exploits the ‘‘Arranging view’’ visualization technique[7], where two different views are presented in separated windows, and the user canarbitrarily arrange them to facilitate the comparison of data. Consequently, data areshown in two canvases, which can be rotated, zoomed and moved to easily examine shape,density and inner pits of the cloud of points.The user interface, shown in Fig. 8, is composed of six main panels:

AR-Extraction panel: This panel extracts the rules (in XML format) from the output ofthe mining algorithm. It contains a field to set the required level of confidence, a buttonto choose the file, and a combo box to choose, among the rules contained in the selectedfile, the one to display. � Layer+Info panel: This panel allows to set rendering styles and shows detailed

information about the rules. It allows the user to chose whether to visualize the ‘land’

Page 15: Exploratory spatio-temporal data mining and visualization

ARTICLE IN PRESS

Fig. 8. Schema of the user interface for the Java3D-based visualization tool.

P. Compieta et al. / Journal of Visual Languages and Computing 18 (2007) 255–279 269

(ground) or ‘locations’ (association rule’s bounding cubes—the supporting area) layer.A set of radio buttons allows the selection of the subset of points to show, i.e. Nopoints, Points having the specified value in the rule, and All points of the data set havingthat value. Items can be reloaded separately, with distinct display settings (grid/points).

� Log panel: This panel is aimed at providing some textual output to the user—general

information on the execution state of the drawing and data-fetching threads.

� Antecedent panel: It is a Java3D canvas, aimed at rendering the selected (active)

antecedent of the current rule. The active antecedent can be chosen through a set ofTabbed controls, each corresponding to a tabbed panel, placed on the top of the canvas.This canvas has five DoF, related to mouse movements and button clicks.

� Consequent panel: As the previous panel, an itemset is shown through several tabbed

inner panels as well as there are five DoF, related to mouse movements and buttonclicks. Usually the Antecedent and Consequent panels are used complementarily tobetter understand and compare different layers of the visualized content: a frequent use-case is that of visualizing an isocloud of points in one panel and the bounding box (setof locations) of the rule (perhaps enabling the land mass too) in the other (see Fig. 9).

� Distribution panel: this panel allows to render a variable of the data set (e.g. ‘‘Pressure’’),

independently of the rules involving it. It is also possible to select which timestep thedata has to be fetched from (‘‘(e.g. 23rd timestep)’’). Each value’s (‘‘(i.e. 0,1,y,9)’’)distribution, at a given timestep, is displayed separately.

It is also worth to point out that the two 3D canvases can be freely resized: for example,one can be closed, in order to allow the maximum flexibility and customization of thevisualization space. In terms of the interaction with the canvas, we implemented a five-DoF mechanism, allowing users to control the pan on the three axes, together with yawn

Page 16: Exploratory spatio-temporal data mining and visualization

ARTICLE IN PRESS

Fig. 9. Visualization of the shape of a rule.

P. Compieta et al. / Journal of Visual Languages and Computing 18 (2007) 255–279270

and pitch. These movements can be controlled through specific combinations of left andright mouse clicks, and mouse scrolls.It is worth to stress that the main strength of this application is the innovative and

designed-on-purpose functionality of drawing, upon user request, the ‘‘shape’’ of a rule,intended as that particular region of the space where the rule holds—that is, the set oflocations in which the rule (hence all items involved in it) is well supported. An example ofthis feature can be seen in Fig. 9, where we see the ‘‘U ¼ 1’’ distribution restricted to thesmall area supporting the rule. While simply removing confusion and overload of visualinformation from the screen, it also helps to highlight the structure of any patternembedded in the data and to focus the user’s attention only on the subset of the data setinvolved in the rule being studied. This allows a more efficient and light visualizationprocess, even when displaying millions of points.As we will remark afterwards (while discussing the results of the mining process), not

only the content of a rule is important, but also its shape: from a domain-expert’s point ofview, it might tell lots of information about the behavior of the phenomenon beingstudied—thus being able to narrow the visualization phase only to that shape is extremelyvaluable in interpreting all results.

4.2.1. Technical aspects

The development of this application required us to design and implement a systemexploiting many different technologies, including 3D computer graphics, RDBMSconnections, XML, etc. For each of these technologies we have surveyed existing high-level, versatile solutions. For instance, many different programming libraries for delivering3D applications are currently available, including OpenGL, Direct3D and Java3D.Similarly, many alternatives exist to connect to RDBMS (e.g. ADO, JDBC, etc.). Wechoose to develop our application within the Java programming environment, whose keyadvantage is the large availability of coherent Application Programming Interface (API),delivered by Sun Microsystems. This is true also for the 3D visualization domain, through

Page 17: Exploratory spatio-temporal data mining and visualization

ARTICLE IN PRESSP. Compieta et al. / Journal of Visual Languages and Computing 18 (2007) 255–279 271

the Java3D APIs. Indeed, they are intended as a standard extension to the Java 2 JDK,providing 3D graphics support for building applications and applets able to manageinteractive 3D geometries and sounds. Thus, Java 3D is basically a hierarchy of Javaclasses which serve as the interface to more sophisticated 3D applications in a scalable andplatform-independent way [11]. Sun developed Java3D as a sort of wrapper for theOpenGL libraries [39], which represents the standard cross platform API in the 3Ddomain. In this way, it becomes easy to port Java3D on every operating system supportingOpenGL. This solution reduces the overall performance, by introducing a new softwarelayer between the application and the hardware, but however, by taking advantage of Javathreads, the Java 3D renderer is capable of rendering in parallel, achieving interestingperformance [40]. Furthermore, the ease of learning an extremely high-level library(providing really complex constructs and primitives) as J3D actually is, has been of greathelp in focusing on data and concepts rather than on coding.

Other advantages inducted by the adoption of Java are extensive documentation andsupport, the availability of free programming environments and a fully object-orientedapproach. Furthermore, the developed visualization tool can be ported on every softwareplatform providing a Java Virtual Machine and an implementation of Java3D APIs, suchas Windows, Apple OS X, Linux, Irix, etc.

Finally, to test the performance, we ran some tests on a Pentium 4 2.8GHz, equippedwith 1GB of RAM and an old GeForce 2 GTS video card. The visualization tool wasable to easily render up to 400,000 points (200,000 per canvas) without noticeabledeterioration in the frame rate, thus allowing the mining experts to deal with a significantamount of data.

5. A case study: hurricane Isabel

We have tested our system on a large spatio-temporal data set. This sections detail thedata format and the experimental results obtained.

5.1. The data set

Hurricane Isabel was the only Category 5 hurricane of the 2003 Atlantic hurricaneseason (see Fig. 10). It made landfall on September 18, 2003 in North Carolina. Officialreports state that an official damage estimate of 3.37 billion of US Dollars.

All the key data about this phenomenon were logged for two days by the NationalCenter for Atmospheric Research in the United States. The corresponding data set wasproduced by the Weather Research and Forecast (WRF) model, courtesy of NCAR, andthe U.S. National Science Foundation (NSF), and is freely available at http://www.vets.ucar.edu/vg/isabeldata/. See Fig. 11 for the list of variables contained, and[12] for a detailed description of the data format.

All variables are real-valued and were observed along 48 timesteps (once every hour for twodays), in a space having 500� 500� 100 ¼ 25� 106 total points. Each variable, in eachtimestep, is stored in a different file, resulting in 624 files of about 100MB each. This finefragmentation allows great flexibility in choosing different subset of data for each mining task.

Therefore, the Hurricane Isabel Data set is a proper instance of a massive geographicalspatio-temporal data set, and is widely adopted for data visualization studies, such as theones made for the IEEE 2004 Visualization Contest [41].

Page 18: Exploratory spatio-temporal data mining and visualization

ARTICLE IN PRESS

Fig. 10. Hurricane Isabel shot, from satellite (http://science.nasa.gov/headlines/y2003/18sep_isabel.htm).

Fig. 11. Hurricane Isabel data set’s layers.

P. Compieta et al. / Journal of Visual Languages and Computing 18 (2007) 255–279272

Page 19: Exploratory spatio-temporal data mining and visualization

ARTICLE IN PRESSP. Compieta et al. / Journal of Visual Languages and Computing 18 (2007) 255–279 273

Since traditional PCs are not able to handle interactively such an amount of data (morethan 25 millions points, arranged on 62.5GB), some special techniques must be necessarilyadopted to filter the data. These are usually based on some compression or samplingapproach to view portions of the data. In this study we went a step further and not onlyprovided a way to visualize subsets of the data, but aimed at rendering specific patternsand relationships among them.

5.2. Visual results for exploratory analysis and decision support

In this section we show results obtained by applying the adapted Apriori algorithm tothe Hurricane Isabel data sets and by viewing them with the Java3D tool described inSection 4.2. We analyzed many results and report here only some of the most meaningful.In our analysis we tried to discover specific patterns/characteristics that were either wellknown about hurricane data (such as the existence of an area in the centre of the data set,called the eye of the hurricane) or not known but of possible significance for an expertstudy.

Having thousands of locations in the entire space permits millions of possible differentshapes for a rule. While analyzing results, the decisive attribute in the early screening hasreally been the area they covered, rather than their content. In most cases, helped todiscard unimportant results.

However, sometimes it turned out to be misleading: Fig. 12 illustrates an example ofapparently trivial shape that instead is interesting.

Fig. 12. Rule supported by an ‘interesting’ area: hurricane’s eye and lateral wings.

Page 20: Exploratory spatio-temporal data mining and visualization

ARTICLE IN PRESSP. Compieta et al. / Journal of Visual Languages and Computing 18 (2007) 255–279274

Indeed, Fig. 12 shows a rule covering a relatively small area corresponding to the eye ofthe hurricane, with two little backward-facing wings. This is one of the rules we extracted,whose shape resembles the shape of the hurricane or one of its features. This rule relatespressure with temperature in this small area on the top of the hurricane.It is well known that pressure assumes always its lowest value in the eye of the hurricane:

however, in this rule we can observe a hidden pattern, showing a ‘normal’ value for thepressure that implies a very low temperature. The eye of the hurricane has muchimportance when predictions have to be made regarding its strength and speed. Therefore,rules featuring unusual, oriented or hurricane-shaped areas—even if including only a fewlocations—may be essential to discover new behaviors.All variables represent natural events, each with a different, and often not well defined,

distribution. Fig. 13 illustrates an example: the distribution of the wind variable appearsalmost everywhere, although with very low support.When using photographs to analyze a hurricane, all wind components are invisible; they

can only be guessed by the user and thus they are not exploited in a ‘human’ study; on theother hand, a normal mining process can detect such a presence but is incapable of

localizing it or taking into account its ‘density’ with respect to space. In this situation, themethod developed during this work becomes very effective: it can detect only those areasreporting a sufficient number of points having large values for that phenomenon (seeconsequent panel in Fig. 13). Actually, this process of selection applies a light form of

Fig. 13. ðW ¼ 4! P ¼ 3Þ in ts ¼ 48. W is the wind’s East–West component. Only the area supporting this rule is

shown below.

Page 21: Exploratory spatio-temporal data mining and visualization

ARTICLE IN PRESS

Fig. 14. Rule ðTC ¼ 8! V ¼ 4Þ in ts ¼ 24.

P. Compieta et al. / Journal of Visual Languages and Computing 18 (2007) 255–279 275

compression to the data describing the event, that is much faster than punctual data mining

(i.e. rules pointing to single points instead of small locations: unfeasible in most problems).The main objective in studying hurricanes is that of predicting what they are going

to do in the immediate future. To this aim rules providing hints about direction or strengthof the hurricane are really useful. A meaningful example is given by the rule presented inFig. 14. Such a rule is supported everywhere near the sea (and the land), all over the space,but a narrow air flow (transversal wind component) penetrating the cloud reveals aninteresting behavior: the direction of this stream is exactly perpendicular to that of thehurricane, that at this timestep is steering north to make its landfall and proceed towardCanada. This rule presents a directional infiltration from south-west reaching exactly theeye of the hurricane (the small hole on the left). The antecedent panel provides a top viewwhere the points representing TC, however, quite uniform in the entire area, are notdisplayed to better relate the ‘intruder’ to the land mass.

As meteorological phenomena are usually highly influenced by the morphology of thearea they cover, we have analyzed results of the mining process using our Google Earth-based interface. Since hurricane Isabel struck a coastal area, lacking significant mountainchains, it did not encounter any strong barriers. It is however, interesting to analyze theeffect of the landfall, onto the themes of the data set. We used the GE-based visualizationtool to gain insight into these relationships.

By analyzing the results, we found that Cloud Water and Water Vapour were theparameters most affected by the seashore. This is clearly visible in the following twofigures. In particular, in Fig. 15 the data points displayed in yellow represent the locations

Page 22: Exploratory spatio-temporal data mining and visualization

ARTICLE IN PRESS

Fig. 15. Effects of the dry land on the Cloud Water.

Fig. 16. Effects of the dry land on the Water Vapour.

P. Compieta et al. / Journal of Visual Languages and Computing 18 (2007) 255–279276

of the data set where the Cloud Water has a low level. Points in blue correspond to the eyeof the hurricane. From this figure it is possible to notice how the altitude with a low level ofCloud Water suddenly drops, as soon as the phenomenon impacts the dry land.

Page 23: Exploratory spatio-temporal data mining and visualization

ARTICLE IN PRESSP. Compieta et al. / Journal of Visual Languages and Computing 18 (2007) 255–279 277

A similar phenomenon is visualized in Fig. 16. Yellow points represent the location witha high percentage of water vapour. It is evident that there is a cluster of points over the sea,but when the hurricane meets the land, these points almost disappear.

6. Conclusions and future work

In this paper we have described the system for exploratory spatio-temporal data miningwe have developed. This system includes a mining engine based on an adapted version ofthe well-known A-priori algorithm. Since results of a mining algorithm requireinterpretation, we have focused on visual techniques.

To this aim we have developed two independent visualization tools for viewing andinteracting with the results of the mining process, meant, respectively, for the domainexperts and data-mining experts. The Google Earth application allows to relate thephenomenon being studied to the specific geographic area and associated features. Thesecond visualization tool presents more sophisticated interactivity. Such a tool allows notonly to view the outcome of the mining process, but also to quickly provide different viewsfor efficient comparison. We allow users to compare the data with the shape of the rulesextracted during the mining process. This has proven useful to highlight patterns thatmight otherwise be missed.

One of the advantages offered by these tools is the fact that data are displayed using thebest practice of information visualization [42], while users can interact in a visual (and thusmore natural) fashion, without having to master a query language or understand theunderlying structure of the data set. The right presentation makes it easy to organize andunderstand the information. As a result, this complementary data visualization facilitatesthe extraction of insight from the phenomena being analyzed, while offering a betterunderstanding of the structure and relationships within the data set.

Our system has been tested on a large real-world data set and has produced interestingresults. However, we plan to perform more extensive testing with domain experts.

The system offers much scope for enhancements and further developments. For exampleto provide more flexibility in the visual analysis, we plan to add an indication of thestrength of a rule in each location, changing dynamically the support (while viewing theresults) to obtain ‘larger’ or more exact rule (i.e. being able to see the shape of the samerule—if possible—using results from mining tasks with different support). We also intendto integrate the two visualization tools allowing to switch in a continuous fashion betweenthem, maintaining the same perspective.

We will investigate modeling association rules involving different areas for theantecedent and consequent. For what concerns the pre-processing step, we will analyzethe user of a new technique for narrowing the variables’ domain in non-uniform intervals.We also would like to consider additional derived values (like ‘distance from the eye of thehurricane’, ‘total wind speed’, etc.) for a more interesting analysis.

References

[1] R. Agrawal, T. Imielinski, A. Swami, Mining association rules between sets of items in large databases, in:

ACM SIGMOD Conference, 1993.

[2] U.M. Fayyad, G.G. Grinstein, Introduction, in: Information Visualization in Data Mining and Knowledge

Discovery, Morgan Kaufmann, Los Altos, CA, 2001, pp. 1–17.

Page 24: Exploratory spatio-temporal data mining and visualization

ARTICLE IN PRESSP. Compieta et al. / Journal of Visual Languages and Computing 18 (2007) 255–279278

[3] Y. Bedard, T. Merrett, J. Han, Fundaments of spatial data warehousing for geographic knowledge discovery,

Geographic Data Mining and Knowledge Discovery, Taylor & Francis, London, 2001, pp. 53–73.

[4] M.F. Costabile, D. Malerba (Eds.), Special issue on visual data mining, Journal of Visual Languages and

Computing 14 (2003) 499–501.

[5] W.L. Johnston, Model visualization, in: Information Visualization in Data Mining and Knowledge

Discovery, Morgan Kaufmann, Los Altos, CA, 2001, pp. 223–227.

[6] I. Kopanakis, B. Theodoulidis, Visual data mining modeling techniques for the visualization of mining

outcomes, Journal of Visual Languages and Computing 14 (6) (2003) 543–589.

[7] N. Andrienko, G. Andrienko, P. Gatalsky, Exploratory spatio-temporal visualization: an analytical review,

Journal of Visual Languages and Computing, special issue on Visual Data Mining 14 (6) (2003) 503–541.

[8] Y. Bedard, Spatial OLAP, 2eme Forum Annuel sur la R-D, Geomatique VI: Un Monde Accessible,

Montreal, CA, 13–14 Novembre 1997.

[9] B. Shneiderman, Inventing discovery tools: combining information visualization with data mining,

Information Visualization 1 (1) (2002) 5–12 (ISSN:1473-8716).

[10] Google Earth, available at /http://earth.google.com/S.

[11] Java3D web site, /http://java.sun.com/products/java-media/3D/S.

[12] National Hurricane Center, Tropical cyclone report: hurricane Isabel, /http://www.tpc.ncep.noaa.gov/

2003isabel.shtmlS, 2003.

[13] A. Hinneburg, D.A. Keim, An efficient approach to clustering in large multimedia databases with noise, in:

KDD, 1998.

[14] J.F. Roddick, B.G. Lees, Paradigms for spatial and spatio-temporal data mining, in: H.G. Miller, J. Han

(Eds.), Geographic Data Mining and Knowledge Discovery, Taylor & Francis, London, 2001.

[15] S. Shekhar, Y. Huang, W. Wu, C.T. Lu, S. Chawla, What’s spatial about spatial data mining: three case

studies, in: R. Grossman, C. Kamath, V. Kumar, R. Namburu (Eds.), Data Mining for Scientific and

Engineering Applications, Kluwer Academic Publishers, Dordrecht, 2001.

[16] M. Spenke, C. Beilken, Visual, interactive data mining with infozoom—the financial data set, in: Third

European Conference on Principles and Practice of Knowledge Discovery in Databases, PKDD’99, Prague,

Czech Republic, 15–18 September, 1999.

[17] D.A. Keim, C. Panse, M. Sips, Visual data mining in large geospatial point sets, IEEE Computer Graphics

and Applications 24 (5) (2004) 36–44.

[18] Y. Yang, G.I. Webb, A comparative study of discretization methods for Naıve–Bayes classifiers, in:

Proceedings of the 2002 Pacific Rim Knowledge Acquisition Work-shop, Japan, pp. 159–173.

[19] K. Koperski, J. Adhikary, J. Han, Spatial data mining: progress and challenges, in: Proceedings of the ACM

SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, Montreal, Canada,

1996, pp. 55–70.

[20] R. Agrawal, R. Srikant, Fast algorithms for mining association rules, in: Proceedings of the International

Conference on VLDB, Santiago, Chile, September, 1994.

[21] J.-Y. Pan, C. Faloutsos, GeoPlot: spatial data mining on video libraries, in: Proceedings of the 11th

International Conference on Information and Knowledge Management (CIKM’02), VA, USA, 4–9

November, 2002.

[22] B. Zhang, M. Hsu, U. Dayal, K-harmonic means—a spatial clustering algorithm with boosting, in: TSDM,

2000.

[23] I.S. Dhillon, D.S. Modha, Concept decompositions for large sparse text data using clustering, Machine

Learning 42 (2001).

[24] T. Zhang, R. Ramakrishnan, M. Livny, BIRCH: an efficient data clustering method for very large databases,

in: SIGMOD, 1996.

[25] D.J. Peuquet, It’s about time: a conceptual framework for the representation of temporal dynamics in

geographic information systems, Annals of the Association of American Geographers 84 (3) (1994) 441–461.

[26] G. Hamerly, C. Elkan, Learning the k in k-means, in: NIPS, 2003.

[27] G. Karypis, E.-H. Han, V. Kumar, Chameleon: hierarchical clustering using dynamic modeling, IEEE

Computer 32 (8) (1999).

[28] W. Lu, J. Han, B.C. Ooi, Discovery of general knowledge in large spatial databases, in: Proceedings of the

Fareast Workshop on Geographic Information Systems, 1993.

[29] T.M. Mitchell, Machine Learning, WCB, McGraw-Hill, New York, 1997.

[30] H. Toivonen, Sampling large databases for association rules, in: Proceedings of the International Very Large

Database Conference, 1996, pp. 134–145.

Page 25: Exploratory spatio-temporal data mining and visualization

ARTICLE IN PRESSP. Compieta et al. / Journal of Visual Languages and Computing 18 (2007) 255–279 279

[31] J. Mennis, J.W. Liu, Mining association rules in spatio-temporal data, in: Proceedings of the Seventh

International Conference on GeoComputation, 2003.

[32] Z. Zhang, Y. Lu, B. Zhang, An effective partitioning–combining algorithm for discovering quantitative

association rules, in: Proceedings of the First Pacific Asia Conference on Knowledge Discovery and Data

Mining, 1997, pp. 241–251.

[33] N. Andrienko, G. Andrienko, Exploratory Analysis of Spatial and Temporal Data—A Systematic

Approach, Springer, Berlin, 2006.

[34] P. Buono, M.F. Costabile, F.A. Lisi, Supporting data analysis through visualizations, in: Proceedings of the

Workshop on Visual Data Mining, Freiburg, Germany, 4 September 2001, pp. 67–78.

[35] M.H. Dunham, Data Mining: Introductory and Advanced Topics, Prentice-Hall, Englewood Cliffs, NJ,

2003.

[36] K. Wang, S.H.W. Tay, B. Liu, Interestingness-based interval merger for numeric association rules, in:

Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, 1998,

pp. 121–127.

[37] S. Orlando, P. Palmerini, R. Perego, Enhancing the apriori algorithm for frequent set counting, in:

Proceedings of the Third International Conference on Data Warehousing and Knowledge Discovery

(DaWaK 01)—Munich, Germany, Lecture Notes in Computer Science, vol. 2114, Springer, Berlin, 2001,

pp. 71–82.

[38] KML Specifications, available at /http://earth.google.com/kml/kml_intro.htmlS.

[39] OpenGL libraries, available at /http://www.opengl.orgS.

[40] Java3D community, /http://www.j3d.orgS.

[41] IEEE Computer Society, IEEE visualization 2004 contest, /http://vis.computer.org/vis2004contest/S, 2004.

[42] S. Rivest, Y. Bedard, P. Marchand, Toward better support for spatial decision making: defining the

characteristics of spatial on-line analytical processing (SOLAP), Geomatica 55 (4) (2001) 539–555.