18
Dust & Magnet: multivariate information visualization using a magnet metaphor Ji Soo Yi 1 Rachel Melton 2 John Stasko 2 Julie A. Jacko 1 1 Laboratory for Human–Computer Interaction and Health Care Informatics, School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, U.S.A.; 2 College of Computing/GVU Center, Georgia Institute of Technology, Atlanta, GA, U.S.A. Correspondence: Ji Soo Yi, Laboratory for Human–Computer Interaction and Health Care Informatics, School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA 30332-0205, U.S.A. Tel: þ 1 404 385 2545; Fax: þ 1 404 385 6115. E-mail: [email protected] Received: 2 November 2004 Revised: 5 April 2005 Accepted: 11 April 2005 Abstract The use of multivariate information visualization techniques is intrinsically difficult because the multidimensional nature of data cannot be effectively presented and understood on real-world displays, which have limited dimensionalities. However, the necessity to use these techniques in daily life is increasing as the amount and complexity of data grows explosively in the information age. Thus, multivariate information visualization techniques that are easier to understand and more accessible are needed for the general population. In order to meet this need, the present paper proposes Dust & Magnet, a multivariate information visualization technique using a magnet metaphor and various interactive techniques. The intuitive magnet metaphor and subsequent interactions facilitate the ease of learning this multivariate information visualization technique. A visualization tool such as Dust & Magnet has the potential to increase the acceptance of and utility for multivariate information by a broader population of users who are not necessarily knowledgeable about multivariate information visualization techniques. Information Visualization advance online publication, 23 June 2005; doi:10.1057/palgrave.ivs.9500099 Keywords: Multivariate information visualization; magnet; metaphor; interaction Introduction Most of the data from engineering, science, and business are multivariate, containing more than three attributes. To analyze such data, statistical methodologies can be used. However, such methodologies tend to summarize and compress the amount of information, so decisions made based on statistical results can be misleading due to the loss of the overall context of data. Instead, transforming the raw data into a more under- standable visual pattern with less compression or summarization can yield a more reliable and effective overall analysis and interpretation of the data. 1 This visual representation of data is called ‘information visualiza- tion.’ Various information visualization techniques have been developed to support individuals who analyze data. Among them, multivariate infor- mation visualization techniques specifically deal with multivariate data. In spite of their usefulness, multivariate information visualization techniques are not widespread. This is not unexpected because the major sources of multivariate data are specialized domain areas, which involve high levels of complexity, such as bioinformatics. 2–4 Likewise, multivariate information visualization techniques that deal with these specific data sets tend to operate under the assumption that the users have sufficient levels of training in analyzing data and using the tools. However, many individuals, who are not necessarily trained in these techniques, face common life decisions that involve some form of multivariate data (e.g., investments decisions, college selections, home purchases, and health care Information Visualization (2005), 1–18 & 2005 Palgrave Macmillan Ltd. All rights reserved 1473-8716 $30.00 www.palgrave-journals.com/ivs

Dust & Magnet: multivariate information visualization using a …stasko/papers/iv05-dnm.pdf · 2007. 12. 24. · dimensions, which makes presentation of data less intuitive. Star

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Dust & Magnet: multivariate information visualization using a …stasko/papers/iv05-dnm.pdf · 2007. 12. 24. · dimensions, which makes presentation of data less intuitive. Star

Dust & Magnet: multivariate information

visualization using a magnet metaphor

Ji Soo Yi1

Rachel Melton2

John Stasko2

Julie A. Jacko1

1Laboratory for Human–Computer Interaction

and Health Care Informatics, School of Industrial

and Systems Engineering, Georgia Institute ofTechnology, Atlanta, GA, U.S.A.; 2College of

Computing/GVU Center, Georgia Institute of

Technology, Atlanta, GA, U.S.A.

Correspondence: Ji Soo Yi, Laboratory forHuman–Computer Interaction and HealthCare Informatics, School of Industrial andSystems Engineering, Georgia Institute ofTechnology, Atlanta, GA 30332-0205, U.S.A.Tel: þ1 404 385 2545;Fax: þ1 404 385 6115.E-mail: [email protected]

Received: 2 November 2004Revised: 5 April 2005Accepted: 11 April 2005

AbstractThe use of multivariate information visualization techniques is intrinsically

difficult because the multidimensional nature of data cannot be effectively

presented and understood on real-world displays, which have limiteddimensionalities. However, the necessity to use these techniques in daily life

is increasing as the amount and complexity of data grows explosively in the

information age. Thus, multivariate information visualization techniques thatare easier to understand and more accessible are needed for the general

population. In order to meet this need, the present paper proposes Dust &

Magnet, a multivariate information visualization technique using a magnetmetaphor and various interactive techniques. The intuitive magnet metaphor

and subsequent interactions facilitate the ease of learning this multivariate

information visualization technique. A visualization tool such as Dust & Magnet

has the potential to increase the acceptance of and utility for multivariateinformation by a broader population of users who are not necessarily

knowledgeable about multivariate information visualization techniques.

Information Visualization advance online publication, 23 June 2005;doi:10.1057/palgrave.ivs.9500099

Keywords: Multivariate information visualization; magnet; metaphor; interaction

IntroductionMost of the data from engineering, science, and business are multivariate,containing more than three attributes. To analyze such data, statisticalmethodologies can be used. However, such methodologies tend tosummarize and compress the amount of information, so decisions madebased on statistical results can be misleading due to the loss of the overallcontext of data. Instead, transforming the raw data into a more under-standable visual pattern with less compression or summarization can yielda more reliable and effective overall analysis and interpretation of thedata.1 This visual representation of data is called ‘information visualiza-tion.’ Various information visualization techniques have been developedto support individuals who analyze data. Among them, multivariate infor-mation visualization techniques specifically deal with multivariate data.

In spite of their usefulness, multivariate information visualizationtechniques are not widespread. This is not unexpected because the majorsources of multivariate data are specialized domain areas, which involvehigh levels of complexity, such as bioinformatics.2–4 Likewise, multivariateinformation visualization techniques that deal with these specific data setstend to operate under the assumption that the users have sufficient levelsof training in analyzing data and using the tools. However, manyindividuals, who are not necessarily trained in these techniques, facecommon life decisions that involve some form of multivariate data (e.g.,investments decisions, college selections, home purchases, and health care

Information Visualization (2005), 1–18

& 2005 Palgrave Macmillan Ltd. All rights reserved 1473-8716 $30.00

www.palgrave-journals.com/ivs

Page 2: Dust & Magnet: multivariate information visualization using a …stasko/papers/iv05-dnm.pdf · 2007. 12. 24. · dimensions, which makes presentation of data less intuitive. Star

choices). Furthermore, in the current information age,the availability of information and data relevant to thesedecisions has virtually exploded. Multivariate informa-tion visualization techniques offer powerful insights forthe decision-making process. However, the general (a.k.a.untrained) population cannot be expected to invest thetime and practice necessary to effectively leverage thetools to their potential. There is a clear, growing need tomake multivariate information visualization techniquesmore usable and useful to the general population.

Based on this motivation, the authors propose Dust &Magnet (DnM) as a new multivariate data visualizationtechnique. The main goal of DnM is to visualize multi-variate data in an easy-to-learn and easy-to-use way,enabling the general public to deal with multivariate dataeffectively and efficiently with more satisfactory out-comes.

To deliver the results of the study, this paper isorganized in the following way. The first section of thispaper is a brief introduction of existing multivariateinformation visualization techniques, along with theirintrinsic problems, which will serve as the foundation ofour discussion. Some possible approaches to resolve theseintrinsic problems follow as well. Second, a scenario ofusing DnM with a simple data set is presented. Because ofthe interactive nature of DnM, the authors believe thatthe scenario-based explanation is the most suitable wayto introduce DnM. Third, the underlying interactiontechniques, additional features, and technical details ofDnM are presented. Fourth, the results of a user evalua-tion are presented and interpreted. Finally, a more in-depth discussion about the strengths and weaknesses ofDnM is presented.

Background

Multivariate information visualization techniquesA plethora of multivariate information visualizationtechniques have been invented.5–11 Each has its ownpurposes and uses, so reviewing and comparing them ona feature-by-feature basis would be cumbersome. How-ever, most of these techniques have the same underlyinggoal: to display complex, multidimensional informationusing a lower (e.g., two- or three-) dimensional space.

This common goal induces a problem called ‘the curseof dimensionality reduction,’12 which is called ‘curse’because it is essentially an unavoidable problem incurrent multivariate information visualization. Depend-ing on the visualization technique, dimensionality can bereduced in various ways. Some techniques compoundseveral dimensions into a small number of dimensions(e.g., Principal Component Analysis8 and Multidimen-sional Scaling (MDS)13). Other techniques do not pre-serve orthogonality between dimensions (e.g., ParallelCoordinates,14 Star Coordinates,11 Star Plot,9 and Chern-off’s Face10). And yet others display the relationship of aportion of dimensions (e.g., Scatter Plot Matrix,8 Trellisplot,5 and Worlds within Worlds6). These techniques

make it possible to portray multidimensional informa-tion on displays that can, pragmatically, only afford one-,two-, or three-dimensional illustrations of information.However, this dimensionality distortion confuses usersbecause the distorted dimensions are not necessarilyanalogous to the properties of real-world phenomena.Training is therefore necessary for users to overcomethis disconnection and to take advantage of thesemultivariate information technologies. Reviewing someexemplary multivariate information visualization tech-niques will provide a better understanding of thisissue.

One of the most salient examples to demonstratedimensionality distortion is MDS. MDS is a visualizationtechnique employed to envision n-dimensional data on atwo-dimensional space, while preserving the distancerelationship of highly dimensional data points as muchas possible. In other words, if two points are near to eachother in the original n-dimensional space, MDS tries topreserve the proximity of the two points in the projectedtwo-dimensional space. Thus, in MDS, an increase ofdimensionality does not require more space, and multi-ple dimensions can be considered simultaneously. Eventhough MDS rather clearly shows clusters of data sets, itmight be difficult to extract the meaning from thevisualization because the original n-dimensional dataare reduced to two dimensions. Without a proper level oftraining and background knowledge, it is generally hardto interpret the output of MDS.13

Another interesting technique is parallel coordinates.This technique contains a set of parallel lines, eachrepresenting a different variable. The values of thevariables lie along these lines. Another set of lines, eachrepresenting a record of the data set, runs across theparallel lines crossing the parallel lines at the values ofthe variables for each record. Figure 1 is a visualizationexample of parallel coordinates using XmdvTool,15,which is a public-domain tool for multivariate data visualexploration developed at Worcester Polytechnic Institute.The data set used for Figure 1 is a subset of a car data setfrom the Committee on Statistical Graphics of theAmerican Statistical Association (ASA) Second Expositionof Statistical Graphics Technology.16 As the figure shows,parallel coordinates is useful for detecting outliers andtrends in a data set, although following individual cases isquite difficult, especially with large data sets. Having aninteractive tool to assist with following the individualrecords or color coding particular cases aids with trackingspecific data.14,17,18 One might say that this techniquehas less dimensionality distortion because all the dimen-sions can be displayed without omission. However, thisadvantage is provided by removing orthogonality amongdimensions, which makes presentation of data lessintuitive.

Star coordinates is another exemplary technique thatcategorically falls between MDS and parallel coordinates.The star coordinates technique arranges multiple coordi-nate axes on a two-dimensional plane, made possible

Visualization using a magnet metaphor Ji Soo Yi et al

2

Information Visualization

Page 3: Dust & Magnet: multivariate information visualization using a …stasko/papers/iv05-dnm.pdf · 2007. 12. 24. · dimensions, which makes presentation of data less intuitive. Star

(as with parallel coordinates) by partially ignoringorthogonality (the term ‘star’ was derived from the radialshape of coordinate axes). Data points are plotted relativeto these radial coordinate axes, so each point’s locationaccounts for the entire set of multidimensional attri-butes. Ambiguities can arise when two points withdifferent attributes are coincidently located at the sameposition.11 Yet, the star coordinates technique canrepresent a greater number of dimensional relationshipsthan the parallel coordinates technique, and with lessdimensionality distortion than MDS.

The final example technique for review is scatter plotmatrix. A scatter plot matrix is composed of a collectionof simple, two-dimensional scatter plots. A scatter plotshows correlations between two variables by plotting datapoints on two orthogonal axes. A scatter plot matrixshows the correlations between pairs of variables amongmultiple variables by integrating multiple scatter plots.Figure 2, which uses the same data and tool as Figure 1does, shows an example of a scatter plot matrix. As shownin Figure 2, each scatter plot only visualizes two variables,so there is no severe dimensionality distortion. Therefore,it can be argued that a scatter plot matrix is easy tounderstand and does not introduce any dimensionalitydistortion. However, a scatter plot matrix does notfacilitate the extraction of relationships between morethan two variables. For this reason, Hand et al.8 call thescatter plot matrix approach a multiple ‘bivariate’

information visualization technique instead of a multi-variate information visualization technique.

From these examples, it can be understood thatmultivariate information visualization techniques inher-ently have a dimensionality distortion problem. How-ever, the severity of dimensionality distortion is not thesame among the techniques described. Instead, a trade-off between the severity of dimensionality distortion andthe number of dimensional relationships is also identi-fied. Thus, trying to resolve the dimensionality distortionproblem completely might be futile. Instead, it might bewiser to try to strike a balance by relieving the complexityand difficulties that are involved in multivariate informa-tion visualization. This is the question addressed in thispaper.

MetaphorOne possible approach to make visualization techniqueseasier is by using a metaphor. Dix et al.19 suggest that,‘Metaphors are used quite successfully to teach newconcepts in terms of ones which are already understood’(p. 148). Given that multivariate information visualiza-tion is a relatively new concept, finding a propermetaphor for the visualization and delivering this newconcept to general users is of primary importance.

However, a metaphor is not a panacea. At the initialstage of adopting a new concept, a metaphor can workwell to bridge between a traditional object and the new

Figure 1 Visualization of a car data set using parallel coordinates.

Visualization using a magnet metaphor Ji Soo Yi et al

3

Information Visualization

Page 4: Dust & Magnet: multivariate information visualization using a …stasko/papers/iv05-dnm.pdf · 2007. 12. 24. · dimensions, which makes presentation of data less intuitive. Star

concept, but it also can block exploration of newmechanisms and limit possible interfaces.20 For example,the graphical user interface (GUI) metaphors of file,window, and desktop are considered passe by someresearchers because these metaphors are bounded toreal-world objects, so designers cannot leverage themetaphors to fully express all of the functionalities andinteractions of a current user interface. Thus, using ametaphor based on a real-world object, which constrainspossible interactions, is not appropriate. Therefore, acarefully chosen metaphor should be used.

Among several information visualization techniquesusing metaphors, one document visualization technique,named Visualization By Example (VIBE), was inspira-tional in the development of DnM. The VIBE system usesan interesting concept, called ‘Point of Interest (POI),’ toorganize multiple documents. Each POI can representkeywords or sample documents. Documents are scoredbased on the similarity between the document and POI.The positions of documents are determined based on thescores. If a document is similar to a POI, the documenthas a higher score for the POI and is located closer to thatPOI than others with a lower score. If there aredocuments related to multiple POIs, the documents arelocated somewhere between those POIs. In other words,the distance between each document and each POIrepresents the similarity between the document and thePOI. Olsen et al.21 explained the metaphor of VIBE asorganized stacks of documents in an office environment.As similar documents tend to be stacked at similarlocations, similar documents in the VIBE system tend tobe located at similar locations.

The concept of POI is evolved in WebVIBE, a descen-dant of VIBE.22 Instead of relying on a document-stackmetaphor, WebVIBE uses a magnet metaphor to representPOIs. As real-world magnets attract only ferrous (e.g.,iron) particles among a mixture of iron and sand, themagnets on WebVIBE attract related documents to eachmagnet (POI). Because this phenomenon between mag-nets and ferrous particles is very familiar to the generalpopulation, this explanation of WebVIBE using a magnetmetaphor is more intuitive than the explanation of VIBEusing a POI and a document-stack metaphor.

The authors believe that a magnet metaphor can beused not only in document visualization, but also inmultivariate information visualization. Specifically, ap-plying a magnet metaphor in star coordinates is possible.As discussed, star coordinates uses multiple axes, inwhich each axis can be substituted with a magnet. As adata point is located along the axis according to its value,a data point can be located closer to or farther from amagnet according to its value. This substitution mightchange the way to understand the underlying logic of starcoordinates. Originally, the concept of a vector summa-tion of each data point’s value on each dimension wasused to explain the underlying logic of star coordinates.11

Such an explanation can be too daunting for users whodo not have a related mathematical background. How-ever, using a magnet metaphor, the underlying logic ofsummation can be translated more easily. Each magnetrepresents a dimension (attribute or variable), and a datapoint with a high value in the given dimension isattracted to the magnet (axis end point) more stronglythan a data point with a low value in the dimension. This

Figure 2 Visualization of a car data set using a scatter plot matrix.

Visualization using a magnet metaphor Ji Soo Yi et al

4

Information Visualization

Page 5: Dust & Magnet: multivariate information visualization using a …stasko/papers/iv05-dnm.pdf · 2007. 12. 24. · dimensions, which makes presentation of data less intuitive. Star

explanation leverages the familiarity of a magnet todeliver the complicated concept of multidimensionality.

Fortunately, a magnet metaphor avoids the commonproblem of metaphors discussed earlier. While a magnetis a real-world object, it does not necessarily have astrictly defined interaction with the exception of itsunique physical characteristic – attraction. Thus, it ispossible to say that a magnet metaphor is open to otherpossible interactions.

Animation and interactionMany information visualization techniques emphasizethe importance of an interactive user interface (e.g.,Dynamic Queries,23 Star Coordinates,11 VIBE,21 Worldswithin Worlds,6 and Table Lens7). Because interactive anddynamic visualization can provide more opportunities toconvey various aspects of complex data compared with astatic visualization, interaction with the data and thevisualization tool is naturally emphasized in the domainof multivariate information visualization.

However, in many cases, interactive user interfacescome into play after the initial visualization of given datais finished. They do not show how the visualization oroutput is constructed. For example, both star coordinatesand the VIBE system support an interactive user interface,but they only allow users to manipulate the data outputafter it is constructed. In other words, when a data set isimported to the software of Star Coordinates, the datapoints are instantly shown according to the values of datapoints and orientations of axes. Then, the user canchange the directions and sizes of axes, which in turnchanges the locations of the data points. The drawback ofthis approach is that users do not have a chance to seehow the initial output is constructed, which may beuseful in helping individuals better understand thenature of the data, itself, or even the functioning of thevisualization tool or technique. The same thing happensin the VIBE system.

One more interesting point is that even though VIBEand WebVIBE utilize a magnet metaphor, they operatedifferently from how a real-world magnet does. As shownin the left side of Figure 3, a data point, which has a valueof 2 for attribute ‘A’ and 1 for attribute ‘B,’ is locatedbetween the POI ‘A’ and POI ‘B.’ Even though the datapoint is twice as close to POI ‘A’ than POI ‘B,’ the datapoint stays between the two POIs. The physical phenom-

enon that real-world magnets show differs from what theleft side of Figure 3 describes. If there is a slight differencein the magnitudes of attraction between two attractors,or magnets, the ferrous particle moves toward a strongattractor, or a stronger magnet. Thus, the right side ofFigure 3 might depict a more realistic or intuitiveconception.

However, the right side of Figure 3 has an importantproblem in this context: if every data point ends up beingattracted and attached to one POI, how can a user seedifferences among data points?

Both of these problems can be resolved by animatedvisualization as Figure 4 depicts. There are two datapoints: one (red; the one on top) has a value of 3 forattribute ‘A’ and 1 for attribute ‘B’; the other (blue; theone on bottom) has 2 for attribute ‘A’ and 1 for attribute‘B.’ Instead of showing the stable and static visuals,animated visualization shows that the two data points aremoving toward POI ‘A’ because both of the data pointshave bigger values for attribute ‘A’ than attribute ‘B.’However, the speeds of the two data points are different.The red (top) data point approaches POI ‘A’ faster thanthe blue (bottom) data point. The animation of the twodata points shows that they have different values for theattributes. This dynamic visual is more similar to the real-world phenomenon of magnetic attraction, so it can bemore intuitively understood than the output of the VIBEsystem. Additionally, this can give users another chanceto understand the underlying logic of this visualization.

An interactive user interface also can play an importantrole here. Both data points end up attached to POI ‘A’ inthe end if the animation is not stoppable. If a user canstop the animation at the proper time, the user canpreserve a screenshot that shows a distinction betweenthe two data points. Additionally, if a user can movearound a POI or magnet and observes how data pointsfollow the POI or magnet, then the user can have morechances to build intuition about the data set. It is akin toplaying with a magnet and seeing how iron particles are

Figure 3 The attraction as is described for POIs of VIBE system

(left) and a more realistic attraction (right).

Figure 4 The animated visualization of attraction. (Notice that

the red (top) dot is moving toward A faster than the blue

(bottom) dot.)

Visualization using a magnet metaphor Ji Soo Yi et al

5

Information Visualization

Page 6: Dust & Magnet: multivariate information visualization using a …stasko/papers/iv05-dnm.pdf · 2007. 12. 24. · dimensions, which makes presentation of data less intuitive. Star

attracted to the magnet. Actually, this is how mostindividuals learn about magnets and their physicalcharacteristics. This sparkling moment in our childhoodshould be more fully utilized.

Dust & Magnet

OverviewDnM is a new way to represent and manipulate multi-variate information. As the name implies, it vigorouslyuses a magnet metaphor. Most of us have had experienceplaying with magnets when we were children. A magnetcan be used to separate ferrous particles from a mixture offerrous and non-ferrous particles. The functionality ofthis interesting tool (and toy) is quite familiar, so to usethe magnet metaphor in this case yields the possibility ofleveraging its familiarity to enable users to understandnew concepts of our multivariate information visualiza-tion technique, DnM. As a child plays with a mixture ofparticles and magnets, DnM lets a user play with data andmagnets. Similarly, interesting information is oftenhidden within complex data, and DnM can help userspull out this interesting information selectively. Figure 5shows a full screen shot of DnM.

DnM consists of three different views. In Figure 5, thelargest view on the left side of the display (labeled ‘Dust &Magnet’) is the main view, on which most of the userinteractions occur. The top right frame is the controlview, which contains the options and parameters thatusers can adjust. The bottom right frame is the detailview, which contains detailed information about theselected data point(s). As shown in the picture, each datapoint (called a ‘Dust’ particle) in the main view isrepresented as a small black circle. Each variable (calleda ‘Magnet’) in the main view is represented as a blacksquare with a yellow label. The authors intentionallyrepresent the Dust and Magnets with circles and squares,respectively, in order to make them distinguishable.

Scenario: choosing a cerealDue to the interactive nature of DnM, it is difficult toexhaustively explain its features in a narrative way.Instead, an illustrative scenario will be introduced andsome features of DnM will be explained within it.Consider the following scenario:

The data set for this scenario consists of a full list of 77cereals each consisting of 12 attributes including brand

Figure 5 A full screenshot of DnM. On the left is the main window, called ‘Dust & Magnet,’ where most of the activity takes place;

the top right window, called ‘Control,’ is where the user can adjust the values of various variables; and the bottom right window,

called ‘Detail,’ is where the user can view the details of selected Dust particles.

Visualization using a magnet metaphor Ji Soo Yi et al

6

Information Visualization

Page 7: Dust & Magnet: multivariate information visualization using a …stasko/papers/iv05-dnm.pdf · 2007. 12. 24. · dimensions, which makes presentation of data less intuitive. Star

name, manufacturer, type (cold or hot), calories, protein,fat, sodium, fiber, carbohydrates, sugars, potassium, andvitamins. Many people want their cereal to be low in fatand sugar, but high in protein and vitamins. Reviewingthe contents of 77 cereals, however, is not an easy task.The following describes how to use DnM to aid in thiscomplex multivariate decision-making task.

After importing the cereal data into DnM, all the piecesof Dust are located at the center of the main view andocclude each other. When the user selects the Magnetmenu, all of the quantitative or ordinal attributes of thedata are listed as possible Magnets. By clicking ‘Magnet(Sugar (g))’ under the Magnet menu, the user places the‘Sugar’ Magnet on the main view. The user can then dragthe Magnet using a mouse. As the Magnet is beingdragged, the Dust particles with higher sugar levels areattracted toward the ‘Sugar’ Magnet faster than the otherswith lower sugar levels. In other words, based on thedifferent sugar levels, the pieces of Dust start to separate.It is necessary to note that when the user drags a Magnetacross the Dust particles, the particles do not stick to theMagnet, as true iron particles attach to a magnet.Dragging the ‘Sugar’ Magnet results in the representationshown in Figure 6.

In order to find out more about other factors, such asprotein, fat, and vitamins, the user can select all of therelated Magnets under the Magnet menu. To look foritems that have a high level of protein and vitamins, theuser can place the ‘Protein’ and ‘Vitamin’ Magnetstoward the top of the main view. If the user also seekslow sugar and fat content, the ‘Sugar’ and ‘Fat’ Magnets

can be placed toward the bottom and then the user candrag one of the Magnets in order to attract the Dust. Eventhough only one of the Magnets is moved, all of theMagnets attract the Dust. In other words, all of the Dustparticles are attracted toward all of the Magnets simulta-neously according to the values of the attributes of theDust. Some of the pieces of Dust are attracted to thebottom area. Others are attracted to the top area, and stillothers remain around the center area. The cereals nearthe top meet the user’s hypothetical criteria of highprotein/vitamin and low sugar/fat. In order to determinethe brand names of the strong candidates, the user canclick on the different pieces of Dust on the top area whileholding down the ‘CTRL’ key on the keyboard. Theselected Dust particles are then marked with brand namesin red (see Figure 7).

From the visualization displayed in Figure 7, one is ableto see that ‘Special K’ has relatively high levels of protein,while ‘Product 19’ and ‘Total Whole Grain’ possess highlevels of vitamins. ‘Total Whole Grain’ is displayedslightly below ‘Product 19.’ ‘Total Whole Grain’ containssome fat in it, unlike ‘Product 19’ and ‘Special-K,’ whichhave none, as shown in the detail view in Figure 7. UsingDnM, one can decide quickly which cereal to choosebased on those attributes deemed important.

In addition, like many other information visualizationtechniques, DnM allows users to encode extra informa-tion by allowing the user to change the size or color ofDust particles. For example, if the user also decides tocompare the calories, instead of doing the arrangementand attraction all over again, one can encode the amountof calories per serving to the size of the Dust as inFigure 8. The items with the fewest calories (50, in thiscase) have a diameter of 3 pixels, while those with themost calories (160, in this case) have a diameter of 20pixels. Interestingly, the size encoding shows that ‘SpecialK,’ ‘Product 19,’ and ‘Total Whole Grain’ have averagelevels of calories per serving. Instead, ‘All-Bran with ExtraFiber,’ which is a newly emerged candidate, might be abetter cereal choice for this user because it has a relativelysmall number of calories as represented by the size of theDust.

In order to see which manufacturers make the cereals,the user can encode the manufacturer information by thecolor of the Dust particles. On the Color tab, the user canselect ‘Manufacturer’ from the drop down menu, whichthen produces a list of the possible values. Default colorsare automatically assigned for the list, so the user cansimply click the Apply button. Or, in order to change anyof the colors, the user clicks on the box for that variableand selects the desired color on a color palette that popsup. Figure 9 shows the result. The color-coding showsthat Kellogg’s (green: ‘Product 19,’ ‘Special K,’ and ‘All-Bran with Extra Fiber’) or General Mills (blue: ‘TotalWhole Grain’) manufactures the strong candidates thatwe already identified.

Now suppose that the user believes that ‘All-Bran withExtra Fiber’ has an unappealing taste. DnM allows the

Figure 6 A screenshot of a possible result after dragging the

Sugar Magnet on the main view.

Visualization using a magnet metaphor Ji Soo Yi et al

7

Information Visualization

Page 8: Dust & Magnet: multivariate information visualization using a …stasko/papers/iv05-dnm.pdf · 2007. 12. 24. · dimensions, which makes presentation of data less intuitive. Star

user to change criteria easily. This time, the user persistswith the belief that some sugar is necessary, but too muchsugar is still unacceptable. In order to apply this criterion,the user moves the Sugar Magnet from the bottom of themain view to the top. Then, the user goes to the Magnettab and selects Sugar from the drop-down menu where aslider labeled ‘Magnitude’ is located. Dragging the sliderto the left can decrease the magnitude or influence of the‘Sugar’ Magnet as shown in Figure 10. The size of the‘Sugar’ Magnet also becomes smaller than that of the

other Magnets as the magnitude of attraction becomeslower, which visually represents that the attraction of the‘Sugar’ Magnet is weaker than those of the other Magnets.Thus, the user can identify easily which Magnets have astronger pull and which have a weaker pull. The resultingview, in this case, shows that ‘All-Bran with Extra Fiber’has moved elsewhere. ‘Total Raisin Bran’ is probably thebest choice if both nutrition and taste are considered.

From this scenario, DnM demonstrates its potentialeffectiveness in representing multivariate information in

Figure 7 A screenshot of the view after manipulating the four Magnets.

Figure 8 The resulting view after changing the size of Dust to reflect the calorie information.

Visualization using a magnet metaphor Ji Soo Yi et al

8

Information Visualization

Page 9: Dust & Magnet: multivariate information visualization using a …stasko/papers/iv05-dnm.pdf · 2007. 12. 24. · dimensions, which makes presentation of data less intuitive. Star

a user-friendly and exploratory manner. Users can sortthrough a data set using a single criterion or multiplecriteria without losing the context of the whole data set.After sorting, detailed information can be reviewed bysimply clicking on a particle of Dust and viewing itsinformation in the detail view. In addition, the user canselect multiple particles in order to compare the details.Encoding certain variables using the size and the color ofthe Dust is also effective to reveal additional informationand trends. Each criterion can have a different weight (ormagnitude, in magnet terms) by changing the size of theMagnet. Most importantly, the authors believe that thescenario demonstrates that those various tasks can bedone easily and naturally using the intuitive metaphor ofa magnet.

Primary interactionsFrom the scenario, some features of DnM are demon-strated briefly. In this section, two primary interactions(‘attraction’ and ‘adjustment’) of DnM are discussed indepth. The attraction between Dust and Magnets is theprimary interaction of DnM. Whenever a user drags aMagnet, each Magnet attracts Dust. The more relevant aDust particle, the faster the Dust particle is attracted toeach Magnet. Because the levels of attraction are differentaccording to various factors, Dust is clustered and sorted,

so the layout of Dust might become a more meaningfulpattern. However, the attraction itself is not enough tomake DnM beneficial. Like other information visualiza-tion techniques, occlusion among data points happens,so proper ways to avoid occlusion (e.g., ‘Shake Dust’)have been developed and will be discussed later in thispaper. Additionally, DnM has a high degree of freedom,which makes DnM more interactive, although this alsomakes regenerating the same presentation difficult. Thus,ways to ‘adjust’ these problems compose the otherprimary interaction technique.

Attraction In order to simplify the explanation ofattraction, suppose only one Dust particle and oneMagnet are located on the main view, and the magnetis dragged to generate attraction between the two. Themagnitude of attraction is determined by the followingfour factors:

� the assigned attribute of the Magnet,� the value of the matched attribute of the Dust particle,� the magnitude (strength) of the Magnet,� the repellent threshold of the Magnet.

The first two factors should be considered together. EachMagnet is assigned to an attribute. Among all of theattributes of the Dust, only those values of the matched

Figure 10 The resulting view after changing the magnitude of the ‘Sugar’ Magnet and moving it near the top of the view.

Figure 9 The resulting view after changing the color of the Dust to reflect the manufacturer information.

Visualization using a magnet metaphor Ji Soo Yi et al

9

Information Visualization

Page 10: Dust & Magnet: multivariate information visualization using a …stasko/papers/iv05-dnm.pdf · 2007. 12. 24. · dimensions, which makes presentation of data less intuitive. Star

attribute of the Dust affect the attraction. For example, ifthe Magnet is assigned to attribute ‘A,’ only the value ofattribute ‘A’ of a Dust particle affects the magnitude ofattraction. Taking advantage of these factors, a user canattract Dust particles selectively.

However, the value of the matched attribute of theDust particle cannot solely determine the level ofattraction. Two more pieces of information should beconsidered: the type of the attribute and the range ofvalues of the attribute. The range of values can varyaccording to the attribute. For example, suppose thathuman age can be between 0 and 120 years and humanweight can be between 0 and 500 lb (about 226 kg). Thus,the numerical value of 100 in the attribute of human agediffers in meaning from 100 in the attribute of humanweight. The former corresponds to a relatively high value,corresponding to a high level of attraction. However, thelatter is relatively low, so it should produce a low level ofattraction. In order to normalize this difference of scales,min–max normalization is used. Min–max normalizationmaps the minimum value to 0 and the maximum valueto 1. If the min–max normalization is applied in theexample of human age and weight, the numerical valueof 100 can be mapped to 0.83 and 0.20, respectively. Theother information that should be considered is the typeof the matched attribute, which is one of three attributetypes: nominal, ordinal, and quantitative. Nominal datacannot be quantified, meaning that the level of attractionfor the Magnet cannot be calculated. Thus, nominalattributes are not listed as Magnet options. However,nominal data are still useful to provide additionalinformation about data, such as name, title, and descrip-tion. The other two types of data, quantitative andordinal, are quantifiable, making these types of attributesviable Magnet options. Strictly speaking, ordinal attri-butes should not be listed as Magnets as well becausevalues of ordinal data do not carry precise numericalvalues. However, sequential integers are assigned toordinal data based on the order of the values, and itturns out that the attraction based on these assignedsequential integers can be effective to show clusters andtrends. Thus, depending on the type of the matchedattribute, the value of each Dust particle has a differentmeaning. If it is nominal, it does not affect themagnitude of attraction at all. If it is quantitative orordinal, it affects the magnitude of attraction, but theway to calculate the level of attraction in the quantitativetype is slightly different from that of the ordinal type.More details about the algorithm used will be explainedwhen the repellent feature is introduced.

The magnitude of a Magnet is another factor. Figure 10shows how to change the magnitude of a Magnet. Asshown in the figure, the magnitude of a Magnet can beadjusted using a slide bar, and the size of a Magnet in themain view also changes according to the slide bar, whichmakes the change of magnitude more visually obvious tousers. The default magnitude of Magnets is 10, and it canbe adjusted between 0 and 20. If the magnitude of a

Magnet is zero, it means the Magnet does not affect anyDust. However, the size of the Magnet never becomeszero, which would make the Magnet invisible. Instead,the Magnet becomes small and grays out to show that theMagnet does not have attraction.

A user can also adjust the repellent threshold. Usingthis factor, a Magnet not only attracts Dust, but can alsorepel Dust. Suppose the assigned attribute of a Magnet isquantitative. If the repellent threshold is zero (thedefault), there is only attraction and no repulsion. If thethreshold is adjusted to a value (x), Dust particles whosevalue of the attribute is less than the repellent threshold(ox) are repelled and Dust particles with higher valuesthan the threshold (4x) are attracted. For example, in theleft screen shot of Figure 11, the repellent threshold is setto 2, so Dust particles that have a value of less than 2 forthe Protein attribute would be repelled from the ‘Protein’Magnet. In contrast, Dust particles that have a valuegreater than 2 in the ‘Protein’ attribute would beattracted to the ‘Protein’ Magnet. In the case that thematched attribute is ordinal, a user can select values torepel by selecting checkboxes. Dust particles that havethe selected target value for the selected attribute (i.e., theattribute values with the checks) are repelled. Forexample, in the right screenshot of Figure 11, ‘Quaker’and ‘Ralston’ are selected, so Dust particles that have‘Quaker’ or ‘Ralston’ as their manufacturer attributewould be repelled from the ‘Manufacturer’ Magnet, butother Dust particles would be attracted to the Magnet.This factor is very useful to separate Dust particleseffectively because the attraction and repulsion of Dustparticles are totally opposite. Fortunately, the possibilityof occlusion can also decrease.

Thus, the magnitude of attraction between a Dustparticle and a Magnet can be described by the followingequations.

In the case that the jth attribute is quantitative,

attraction ðMj;DiÞ ¼MMjðDV

ji � RTjÞ

maxðfDVjkgk¼1;...;dÞ � minðfDV

jkgk¼1;...;dÞ

; ð1Þ

where Mj is the Magnet with the jth attribute, Di the ithDust particle, MMj the magnitude of the jth Magnet A[0,20], DVi

j the jth attribute of the ith Dust particle, RTj therepellent threshold of the jth variable, and d the totalnumber of Dust particles.

In the case that the jth variable is ordinal,

attractionðMj;DiÞ ¼

MMjð�1ÞRjðDVj

iÞ½DV

ji � minðfDV

jkgk¼1;...;dÞ

maxðfDVjkgk¼1;...;dÞ � minðfDV

jkgk¼1;...;dÞ

; ð2Þ

where Mj is the Magnet with the jth attribute, Di the ithDust particle, MMj the magnitude of the jth Magnet A[0,20], DVi

j the jth attribute of the ith Dust particle, Rj(x)¼0if x is not a repellent value for the jth variable and ¼1 if xis a repellent value for the jth variable, and d the totalnumber of Dust particles.

Visualization using a magnet metaphor Ji Soo Yi et al

10

Information Visualization

Page 11: Dust & Magnet: multivariate information visualization using a …stasko/papers/iv05-dnm.pdf · 2007. 12. 24. · dimensions, which makes presentation of data less intuitive. Star

In summary, the magnitude of attraction is propor-tional to the magnitude of a Magnet and the matchedvalue for a given piece of Dust. As previously discussed,the range of the matched value can vary according to thedimensions of the data, so min–max normalization isused to normalize any difference in scales. Also, if thevariable is quantitative, a certain Magnet might have arepellent threshold, or, if the variable is ordinal, somevalues of variables are marked as repellent. In those cases,the value of attraction can be negative, which means thata Magnet can repel Dust instead of attracting it. Thisrepulsion feature can be useful in order to diminish theamount of overlapping and to make clusters moredistinctive. As mentioned previously, the magnitude ofMagnets and the repellent threshold can be adjusted onthe Magnet tab of the control view as shown in Figure 11.

A Dust particle is attracted toward or repelled from aMagnet while a user drags the Magnet. The speed ofattraction/repulsion is determined by the magnitude ofattraction. Because the Dust particle is attracted orrepelled only while the Magnet is being dragged, a usercan control how far the animation of Dust progresses.Additionally, it yields a comparable experience toencountering magnets for the first time in childhood.While a user drags a Magnet on the screen, Dust is pulledtoward or pushed away from the Magnet. By observingthis, the user can hopefully learn intuitively how DnMworks. As shown in the scenario, initially all of the Dust islocated in the center of the main view, so the user isforced to bring up a magnet and drag it around the Dust.The movements of Dust particles are different based onthe values of each Dust particle for the given attribute.The speed and direction of each Dust particle give extraclues about the data set and the underlying logic of DnM.

If there are multiple Magnets on the main view, all ofthem attract/repel Dust simultaneously. Thus, the direc-tion and magnitude of attraction is calculated by vectorsummation of each attraction between a Dust particleand every Magnet. This logic of vector summation issimilar to that of Star Coordinates. However, users shouldbe able to understand the underlying logic of DnM moreeasily because of the familiarity of the magnet metaphor.One might think that it is less intuitive to have allMagnets affect Dust simultaneously whenever one of theMagnets is dragged. However, this type of attraction waschosen because it was assumed to be more helpful toaccomplish tasks involving multiple attributes thansingle attraction, where the Magnet being dragged wouldhave solely affected Dust. If necessary, the single attrac-tion can be simulated by adjusting the magnitudes of allMagnets to zero except for the desired Magnet, or byremoving the other Magnets in the view.

Adjustment As briefly mentioned before, the interactiontechnique of attraction alone is not sufficient to renderDnM usable because two major problems occur: occlu-sion and a lack of reproducibility. DnM allows occlusionof Dust because it is natural to be located at the same orsimilar location if the attributes of the particles of Dustare the same or similar to each other, especially in thecase of large data sets. However, it is difficult to see eachpiece of Dust, and trends of data sets can be misinter-preted when the Dust specks occlude each other. Theother interaction problem is a lack of reproducibility.Because users can manipulate locations of Magnets freely,reproducing the same output from a data set is challen-ging. These two problems are unfavorable side effects of

Figure 11 Screenshots for how to apply the ‘Repellent’ feature: a slider for a quantitative attribute (left); checkboxes for ordinal

attributes (right).

Visualization using a magnet metaphor Ji Soo Yi et al

11

Information Visualization

Page 12: Dust & Magnet: multivariate information visualization using a …stasko/papers/iv05-dnm.pdf · 2007. 12. 24. · dimensions, which makes presentation of data less intuitive. Star

the design principles of DnM, which permit high levels ofinteraction.

To avoid occlusion, Dust particles in the initial versionof DnM were supposed to repel each other while theywere moving. However, this repelling feature among theDust introduced another problem. If Dust particles withlow values surrounded a piece of Dust with a high value,the Dust particle with the high value could not escapefrom the surroundings because the adjacent particles ofDust blocked the internal ones. Consequently, thelocations of the Dust failed to reflect the attractionamong pieces of Dust and the Magnets correctly. There-fore, the ‘Shake Dust’ feature was implemented, instead.Whenever a user wants to see all of the Dust withoutocclusion, ‘Shake Dust’ can be used through the menuuser interface or a shortcut key.

The primary concern in implementing this feature wasthat the automatic ‘Shake Dust’ feature might confuseusers. It is possible to align Dust specks without occlusioninstantly. If it happens, however, users can lose track ofthe Dust. Thus, smooth and gradual spreading-out ispreferred.

The underlying idea of the ‘Shake Dust’ feature issimple. If a Dust particle occludes other Dust particles, itmoves to avoid occlusion. Every Dust particle in turndodges other Dust particles. In order to find the over-lapped Dust particles, DnM maintains an adjacencymatrix internally that contains the distance betweenevery two Dust particles. By looking up this adjacencymatrix, DnM does not need to calculate the distancebetween every two Dust particles exhaustively. Whenevera Dust particle moves, DnM updates the adjacency matrixpartially instead of thoroughly.

The pseudo-algorithm of ‘Shake Dust’ dictates thateach piece of Dust is processed according to steps 1–3.

1. Update the adjacency matrix for a Dust particle (e.g.,Dust particle ‘A’).

2. Find other Dust (e.g., any neighboring Dust particles)that occludes Dust ‘A’.

3. If occlusion is found, do (a)–(c) for each neighboringDust.

(a) Calculate the relative location of a neighbor Dustto Dust ‘A’.

(b) Move Dust ‘A’ away from the neighbor Dust in

one unit, which is corresponding to approxi-

mately one pixel without zooming in/out.(c) Update the adjacency matrix for Dust ‘A’.

As mentioned previously, the purpose of the ‘Shake Dust’algorithm is not to align all the Dust without occlusionon the first try. Instead, the ‘Shake Dust’ feature graduallyspreads out Dust as shown in Figure 12. Thus, eachtime the user clicks on the ‘Shake Dust’ menuitem, DnM iterates through the above-mentioned stepsonce. The distance each particle moves is determinedby how occluded it is by other particles, but the move-ment is normally small, allowing the user to trackthe movement. This gradual spreading-out might betoo subtle, so using the shortcut key is easier becausefeeding multiple keystrokes is possible by holdingdown the shortcut key. This allows the user to choosethe proper amount of spreading. Usually, when thegrains of Dust are no longer moving, it is time to stopspreading.

Even with this help for removing occlusion, the lack ofreproducibility still remains a problem. As it is aninherent result of the interactive, exploratory nature ofDnM, the lack of reproduction seems difficult to resolvecompletely. Therefore, two additional features wereimplemented to help alleviate the problem. The first isthe ‘Center Dust’ feature that returns all of the Dust tothe original position, the center of the main view. Thissimple feature is useful when users want to clean up theview and run the attraction over again. If users leave thesettings and orientation of the Magnets constant, thenusing the ‘Center Dust’ feature can allow users torepeatedly generate highly similar reproductions of theoriginal (or previous) output. The second feature toalleviate the reproducibility problem is ‘Attract Dust.’Instead of dragging a Magnet in the main view, the usercan use the ‘Attract Dust’ command to attract theparticles of Dust to all of the Magnets. Like ‘Shake Dust,’each time ‘Attract Dust’ is used the Dust particles move asmall distance: more precisely, two units at the most. Thisfunction is also accessible through a shortcut key, whichcan be more convenient for executing ‘Attract Dust’multiple times. This method is less interactive, so users

Figure 12 A series of screenshots demonstrating the results of the Shake Dust feature.

Visualization using a magnet metaphor Ji Soo Yi et al

12

Information Visualization

Page 13: Dust & Magnet: multivariate information visualization using a …stasko/papers/iv05-dnm.pdf · 2007. 12. 24. · dimensions, which makes presentation of data less intuitive. Star

may miss some of the benefits of DnM. However, if theprecise locations of the Magnets are important, the‘Attract Dust’ feature helps to generate reproducibleoutputs from a data set.

Supporting featuresBesides the two primary interactions, DnM has severalsupporting features, including well-known informationvisualization techniques such as zooming/panning, filter-ing, color/size encoding, detail view, and marking. Thesefeatures are meant to enhance the usability of DnM andalso to improve the overall effectiveness of the primaryinteraction techniques of DnM.

Zooming and panning DnM supports smooth zoomingand panning features. The zooming-out feature is helpfulwhen the number of Dust particles or Magnets over-whelms the allotted screen real estate. The zooming-infeature is helpful when a user wants to see a certainregion more clearly, which can also serve as a simplifiedworkaround to the occlusion of closely coupled datapoints. Consequently, panning is also necessary when auser wants to navigate the main display while the currentview is smaller than the whole view (e.g., when zoomedin).

Filtering In some cases, certain Dust particles are notinteresting and can even disrupt users in making sense ofdata sets, thus DnM provides a filtering capability.Depending on the nature of the dimensions, the filterhas two appearances. If the variable is quantitative, adynamic query function23 is provided. Users can dyna-mically adjust the filter option while observing thechanges on the main window instantly. If the variableis ordinal, the checkbox type filter is presented. Similarly,users can exclude or include certain values of thevariable.

Color and size encoding Even though the most salientfeature of DnM is tracking the locations of Dust particles,additional encoding on the Dust is helpful for under-standing the trends and nature of the data. As shownearlier in the scenario, assigning different colors or sizesfor the values of certain attributes might be useful attimes.

Detail view and marking Although users are able toencode additional information besides the Dust particles’locations by using color and size, one might need to seethe detailed information of several pieces of Dust.Especially, when the scope of candidates is sufficientlynarrow, the attribute-by-attribute comparison can be veryuseful in making a solid decision or distinction.

In order to see the detailed information in the detailview, the users simply have to click on the Dust particle(s)of interest. For a comparison, multiple particles should beselected, which can be done by selecting multiple Dustparticles while holding down the ‘CTRL’ key. Then,

selected Dust particles are labeled with the primaryattribute (the name of the cereal in the case of theprevious scenario) in red, and at the same time, the detailview shows the values of all of the attributes for theselected Dust particles. When another piece of Dust isselected without holding the ‘CTRL’ key, the previousselection is canceled out. Figure 7 is an example of thesefeatures. The marking of Dust can also be useful fortracking the movement of interesting Dust particles (i.e.,data cases) while they are moving throughout the view.

ImplementationThis application is written using Java2SE platform andbased on Piccolo 1.0.24 Piccolo is a Java toolkit that assistswith creating zoomable user interfaces (ZUIs). Piccolo washelpful in the incorporation of the many featuresavailable in DnM: for example, the capability to zoomin/out as well as pan.

Evaluation

ParticipantsIn order to assess the usability and effectiveness of DnMfor information visualization tasks, a user evaluation wasconducted. Six participants (mean age¼24.6) wererecruited from the graduate student population at theGeorgia Institute of Technology. Three students wererecruited from the School of Industrial and SystemsEngineering. These individuals represented experiencedcomputer users who had no prior background in, orknowledge of, information visualization theories ortechniques (the ‘non-InfoVis group’). The remainingthree participants were students recruited from a gradu-ate-level information visualization class taught in theCollege of Computing. These three individuals repre-sented the information visualization ‘experts’ withsufficient knowledge of the domain (the ‘InfoVis group’).By involving the two different user groups, the authorssought to capture not only the perspective of novice usersof information visualization systems, but also theperspective of experts who are familiar with the domainand other information visualization techniques. How-ever, none of the six participants had been exposed toDnM previously. Each participant was compensated with10 U.S. dollars for his or her voluntary participation.

ProcedureFollowing a background questionnaire, participants weregiven a comprehensive tutorial introducing the variousfeatures of DnM. Although participants were givendetailed explanations of how each feature works andeven encouraged to ask any questions during the session,the tutorial was designed to give participants minimalinformation about how to use DnM to accomplishspecific tasks. As one of the key purposes of this userevaluation was to assess how easily individuals couldlearn to use DnM, the authors wanted to examine

Visualization using a magnet metaphor Ji Soo Yi et al

13

Information Visualization

Page 14: Dust & Magnet: multivariate information visualization using a …stasko/papers/iv05-dnm.pdf · 2007. 12. 24. · dimensions, which makes presentation of data less intuitive. Star

whether users could accomplish various informationvisualization tasks without specific instruction on taskperformance using the tool.

After the tutorial, the experimental session started. Thescreen activities and participants’ facial expressions/comments were video- and audio-taped for later review.The cereal data set, used in the scenario section of thispaper, was presented to each participant. Five tasks werethen given to each participant one by one. The five taskswere as follows:

� Task 1: Identify the cereal that contains the mostsodium among Kellogg’s cereals.

� Task 2: Is there a correlation between carbohydratesand sugar content?

� Task 3: Does Post make more cereals than othercompanies?

� Task 4: Please categorize cereals into two types: oneconsists of healthy cereals (more vitamins and fiber)and the other consists of unhealthy cereals (more sugarand calories). Which category has more cereals?

� Task 5: Write down the conditions for your ideal cerealand identify it using Dust & Magnet.

The first four tasks were chosen to exemplify diffe-rent task types (‘identify,’ ‘correlate,’ ‘compare,’ and‘cluster’) according to the common taxonomies ofvisualization techniques by Wehrend25 and Roth andMattis.26 The last question (Task 5) was designed togive the participant a chance to use the features of DnMmore freely.

After performing the tasks, participants were asked tocomplete a post-task questionnaire consisting of multi-ple-choice, Likert scale, and open-ended questions. Theexperiment concluded with an interview designed toelicit more personal, subjective views based upon theobservations during the actual experiment. As a result,some of the questions asked in the interview varied fromparticipant to participant.

Results

Task-by-task observation The first task was to ‘Identifythe cereal that contains the most sodium amongKellogg’s cereals.’ All participants responded correctlywithout any major problems, and they rated this task asthe easiest among the five tasks. Common usage patternsand strategies were as follows:

1. applying a filter to screen out other manufacturersexcept for ‘Kellogg’s,’

2. attracting with the ‘Sodium’ Magnet,3. identifying the fastest Dust particle using the detail

view.

The second task required a response to the question ‘Isthere a correlation between carbohydrates and sugarcontent?’ According to a question in the post-taskquestionnaire, it was perceived as the most difficult outof the five tasks. All participants reported difficulty

interpreting the meaning of the distribution of Dustafter performing attraction with the ‘Sugar’ and ‘Carbo-hydrates’ Magnets. Thus, only three participants re-sponded in a manner we would consider correct, thatis, that there is a (weak) negative correlation. This isprobably partly because of the relatively weak negativecorrelation between sugar and carbohydrates (Pearsoncoefficient¼�0.332). Another cause might be the com-plexity and dynamic nature of multiple Magnets. As thistask shows, finding a correlation between two variables isnot trivial using DnM. Actually, one of participants whosucceeded in this task did not rely on the attractionfeature, but relied on the color encoding feature tocorrectly derive the answer.

The third task required a response to the question,‘Does Post make more cereals than other companies?’Because DnM does not have an explicit feature thatcounts the number of Dust particles, the participantsreported having a difficult time accomplishing a task ofthis nature. Eventually, all participants ended up usingattraction with single Magnet with color encoding to sortthe Dust particles in the order of manufacturer. They alsoused ‘Shake Dust’ to see the Dust particles withoutocclusion for counting Dust particles correctly. Five outof six participants answered the question correctly.However, the participant who answered incorrectly alsoemployed the same usage pattern.

The fourth task was to answer the question, ‘Pleasecategorize cereals into two types: one consists ofhealthy cereals (more vitamins and fiber) and the otherconsists of unhealthy cereals (more sugar and calories).Which category has more cereals?’ Interestingly, allparticipants accomplished this task using similar strate-gies (see Figure 13), even though the tutorial did notcontain any specific instruction about how to accomplishthis type of task. As shown in the figure, all participantsput two Magnets corresponding to healthy variables(vitamin and fiber) on one end and the other twoMagnets corresponding to unhealthy variables (sugarand calories) on the other end. Except for the subtledifference in the arrangement of Magnets, all of theparticipants accomplished the task in the same way,providing evidence that the basic interaction of DnMseems to support this kind of clustering task in anintuitive way.

All participants responded correctly and rated this taskeasier than Task 2, which appeared to be a simpler taskbecause it involved only two variables. Thus, this showsthat the number of magnets used in a task does notnecessarily determine the task’s complexity.

The fifth and last task was ‘Write down the conditionsfor your ideal cereal and identify it using Dust & Magnet.’Figure 14 shows views from the displays of all participantswhile performing this task. Each participant set up his orher own criteria; some chose relatively complex criteria(e.g., involving eight variables as shown in the bottommiddle screen shot of Figure 14). All participantssuccessfully identified their own ideal cereal without

Visualization using a magnet metaphor Ji Soo Yi et al

14

Information Visualization

Page 15: Dust & Magnet: multivariate information visualization using a …stasko/papers/iv05-dnm.pdf · 2007. 12. 24. · dimensions, which makes presentation of data less intuitive. Star

Figure 13 Screenshots of DnM when all participants accomplished Task 4.

Figure 14 Screenshots of DnM when all participants accomplished Task 5.

Visualization using a magnet metaphor Ji Soo Yi et al

15

Information Visualization

Page 16: Dust & Magnet: multivariate information visualization using a …stasko/papers/iv05-dnm.pdf · 2007. 12. 24. · dimensions, which makes presentation of data less intuitive. Star

reporting any major problems. The general usage patternwas as follows:

1. attracting Dust particles with multiple Magnets,2. narrowing down to a couple of candidates using the

filter or color encoding features,3. making the final decision after comparing the candi-

dates using the detail view.

While accomplishing Task 5, participants demonstratedsome advanced and creative usages of DnM. One partici-pant stacked several Magnets (some were repelling, andsome were attracting) on top of each other (see the topleft screenshot in Figure 14). Another participant keptadjusting the magnitudes of Magnets to see the subtledifferences between the final candidates (see the topmiddle screenshot in Figure 14). Another participantsolely relied on the filter feature to trim out unwantedDust particles (see the bottom right screenshot inFigure 14). All of these variations represent creativeapplications of the tool’s features.

These varying usages during task performance demon-strated that DnM does not have limited usage patterns,which one might infer from the unanimous usagepatterns demonstrated in accomplishing Task 4. More-over, the results show that the magnet metaphor, whichDnM relies on, does not suffer from the same limitationsas the metaphors discussed in the Background section ofthis paper. Instead, the six participants who participatedin this user evaluation seemed to understand the magnetmetaphor (as demonstrated in the results of Tasks 1, 3,and 4) and took advantage of the interaction beyond theinstruction (as demonstrated in the results of Task 5).

Generally, as described for each task, the participantsused DnM effectively for accomplishing most of the tasks(finding a correlation was the most challenging). Forother tasks involving multiple variables, DnM could beused easily and also creatively.

Overall impression The overall impression that users hadof DnM as a tool for information visualization was also ofinterest in this evaluation. Thus, the post-task question-naire asked users to rate Dust & Magnet using a 5-levelLikert scale.

Overall, the participants rated DnM between ‘Good’and ‘Great’ (4.00–4.83, on a scale from 1 to 5 with 1 being‘Terrible’ and 5 being ‘Great’) in terms of being ‘easy-to-learn,’ ‘easy-to-use,’ ‘interesting-to-use,’ and ‘helpful.’Specifically, five out of six participants rated DnM as‘Great’ for ‘interesting-to-use.’ For the ‘easy-to-use’ and‘helpful’ criteria, DnM was rated slightly lower, althoughthere were no negative assessments, such as ‘Bad’ or‘Terrible.’

Additionally, subjects were asked, ‘Can you describeyour overall impression of the software using one or twoadjectives?’ during the interview session. The followingverbal responses were given, with V1, V2, and V3referring to the three participants from the InfoVis group

and NV1, NV2, and NV3 referring to the three partici-pants from the non-InfoVis group.

� NV1: ‘Really cool. I want to have a copy.’� NV2: ‘It’s potential. More potential to be great. I am

really into information, so I might be a little bit biasedsubject.’

� NV3: ‘It has a lot of potential. However, it’s personaltool rather than professional tool. For example, SPSS.’

� V1: ‘I like it. I like the metaphor.’� V2: ‘It’s easy-to-use, so intuitive. Easy-to-pick up.’� V3: ’I like the metaphor. Clever. Actually insightful.’

All responses were more positive and encouraging thanexpected. Because DnM is a new user interface for all ofthe participants, and because the given tasks forced themto deal with multivariate data sets, the tasks could havegenerated some frustration among users. However, thatwas not the case. As two participants mentioned, theyliked the magnet metaphor used in DnM and thoughtthat DnM was interesting, intuitive, and easy-to-use.These responses show that the major objective in thispaper was accomplished at least for the six participantsinvolved in the evaluation.

DiscussionAs the results of the study suggest, one of the strengths ofDnM might be that it is easy-to-learn and easy-to-use,especially for users without much knowledge of multi-variate information visualization. As the participants’responses suggest, this strength most likely relies on themagnet metaphor and the interactive implementation ofthe magnetism. Due to the magnet metaphor, the logic ofvector summation can be smoothly and implicitlydelivered to the users. Compared to other multivariateinformation visualization techniques that do not usea metaphor (e.g., Star Coordinates), DnM can be moreeasily understood because the comprehension of com-plex mathematical algorithms is not necessary. Theeffectiveness of the magnet metaphor is also bolsteredby the animated interaction of DnM. A magnet metaphoritself is already used in explaining the concept of POI ofWebVIBE, but the authors believe that DnM differs fromWebVIBE from the perspective of taking advantage ofinteraction to support the magnet metaphor. Animatedinteraction used in DnM piques the curiosity of users andprovides clues about the underlying logic of the tool.Additionally, the interaction in DnM is more similar tothe real physical phenomenon between a magnet andferrous particles. Magnets in DnM attract Dust particlesuntil the particles end up sticking to the Magnets. Incontrast, data points in the WebVIBE system reside on thescreen based on the proximity between POIs, or magnets,which is less realistic and might confuse users.

DnM can allow users to undertake sophisticated tasks,but in a simplified manner. As shown in the scenario andevaluation sections, adjustment of criteria can simply beperformed by placing multiple Magnets and adjusting themagnitudes of the Magnets. Magnets can be positioned

Visualization using a magnet metaphor Ji Soo Yi et al

16

Information Visualization

Page 17: Dust & Magnet: multivariate information visualization using a …stasko/papers/iv05-dnm.pdf · 2007. 12. 24. · dimensions, which makes presentation of data less intuitive. Star

on opposite sides of the Dust, in order to representpositive criteria at one end and negative criteria at theother. Assigning repellent thresholds is another way toadjust criteria. The sizes of Magnets also reflect themagnitudes or strengths of the Magnets, so users can getvisual feedback as well. When comparing the DnMapproach to accomplishing these kinds of tasks to theuse of a spreadsheet application, it is easy to recognizehow much DnM can improve the procedure of analyzingcomplex, multivariate data.

The supporting features, such as zooming/panning,filtering, color/size encoding, providing the detail view,and data marking, can also help the process of makingsense of data. Even though these features are verycommon in modern information visualization tools,their use in DnM can greatly enhance its more uniqueinteraction techniques as shown in some screen shotsfrom the user evaluation.

Despite its benefits, DnM has several limitations. Aboveall, DnM has a lack of reproducibility. BecauseDnM allows users a high degree of freedom in adjustingand manipulating dimensions, it is challenging to expli-citly document certain clustering schemes. Yet, if the userkeeps applying the ‘Center Dust’ feature, it is possibleto regenerate similar clustering. However, the dimensionsof DnM can move all over the main view, so DnM doesnot have the reproducibility that other visualizationtechniques have. This interactive characteristic alsointroduces a lack of precision. Currently, the authorsare working on a ‘Saving’ feature to save the arrange-ment of the main view, so this problem will be alle-viated. Logging exhaustive movements of all theMagnets might generate copious amounts of data, butit can be a comprehensive solution for this problem.However, the problem of reproducibility is really atrade-off with the heightened ability for users to freelyexplore and manipulate the data with a high degreeof freedom.

Occlusion was one of the biggest problems in the earlystages of development. Now, several features are imple-mented to deal with this problem, including repellentoptions for Magnets and the ‘Shake Dust’ feature. Neitheroperation is perfect, although they help to remedyconfusion resulting from occlusion.

Scalability, a common problem with other visualizationtools and techniques, is another issue with DnM. Despitethe enhancements to Java technology and Piccolo, thereare still limitations in processing large data sets. Based onour test trials, conditions in which data sets with morethan 700 data points are analyzed begin to slow down theinteraction noticeably. However, this is primarily true forcertain hardware profiles, including our test conditions:Dell Inspiron 8200 notebook with 2.2 GHz MobilePentium 4 and 256 MB of RAM running MicrosoftWindows XP Professional, Service Pack 1. Additionally,the number of Magnets and Dust can be limited by thescreen size of the main view or the interpretability ofusers. If too many Magnets are placed on the main view,

it may be difficult to explain the movement of the Dustclearly because too many factors are involved. However,the number of attributes inherent in the data set does notseverely affect performance. As long as a given attributeMagnet is not placed on the screen, the algorithm is notbothered by the existence of this attribute or additionalunused attributes.

The repulsion feature sometimes repels Dust particlesoutside of the current window of the main view,potentially causing users to lose track of the Dustparticles. Putting a boundary around the main view wasconsidered, but not implemented. Because a user mightwant to have a large space in order to make sense of his orher data, it is appropriate to give users full freedom to usethe possible space. This problem only happens withMagnets with a repellent threshold and has not beenobserved during the user evaluation. If users apply thisfeature carefully, the problem is easily avoidable andwhen it happens, the ‘Center Dust’ feature allows the userto retrieve the missing particles.

Additionally, the participants of the user evalua-tion suggested several features that make DnM moreusable. The most frequently requested feature was areset feature, which would reset all settings, such as colorand size encoding, filter, the magnitude and repellentthreshold of Magnets, and zooming rate, to the originalvalues. Another suggested feature was a summary statis-tics feature that would show descriptive statisticsof selected Dust particles. The current version of DnMcan show the detailed information of each Dust particle,but it does not have a feature to show summaryinformation, such as count, mean, or standard devia-tion, of selected Dust particles. In addition, moresophisticated color encoding or filter features wererequested. Fortunately, the suggested features do notconflict with the essential part of DnM, so the authorsbelieve that these features can be incorporated into DnMrelatively easily.

Finally, one might question whether DnM can beeffective for people who do not know how to articulatea question. This is a fair question because DnM doesnot supply any systematic approaches to producingthis query initially. Therefore, the user should posethe question in advance in order to find answers usingDnM effectively. Conversely, as shown in the userevaluation, the fact that DnM is easy and interesting touse might encourage users to explore data sets morevigorously, which smoothly lead them to pose properquestions.

ConclusionsFrom the scenario and detailed introduction of interac-tion techniques, the effectiveness and ease-of-use of DnMis demonstrated. A user evaluation with six participantsalso partially verified its effectiveness. By introducing themetaphor of attraction, the complex multivariate infor-mation visualization seems to be explained intuitivelyand easily. Even though some dimensionality distortion

Visualization using a magnet metaphor Ji Soo Yi et al

17

Information Visualization

Page 18: Dust & Magnet: multivariate information visualization using a …stasko/papers/iv05-dnm.pdf · 2007. 12. 24. · dimensions, which makes presentation of data less intuitive. Star

can happen in DnM, the interactive movement ofDust might be helpful for general users, who probablydo not have a background in multivariate informationvisualization techniques, to understand the underlyinglogic.

Additional optimization of DnM is still necessary,along with consideration of any possible usabilityproblems. The authors also hope that DnM will encou-rage the development and use of other simple, butelegant metaphors for information visualization tools.

Acknowledgments

We are indebted to Leon Barnard, Paula Edwards, V.Kathlene Leonard, Thitima Kongnakorn, Kevin Moloney,

and Young Sang Choi for their crucial assistance and

support. Ji Soo Yi’s participation in this research wassupported by National Science Foundation ITR funding

awarded to Dr. Julie Jacko, under Grant No. IIS-0121570.

Any opinions, findings, and conclusions or recommendations

expressed in this material are ours and do not necessarilyreflect the views of the NSF.

References1 Fayyad U, Piatetsky-Shapiro G, Smyth P. KDD process for extracting

useful knowledge from volumes of data. Communications of the ACM1996; 39: 27–34.

2 Saraiya P, North C, Duca K. Evaluation of microarray visualizationtools for biological insight. The IEEE Symposium on InformationVisualization. 2004 (Austin, TX, USA); 2004; 1–8.

3 Bolshakova N. Microarray software catalogue [WWW document]http://www.cs.tcd.ie/Nadia.Bolshakova/softwaretotal.html (accessed27 October 2004).

4 Leung YF. Functional genomics [WWW document] http://genomicshome.com (accessed 27 October 2004).

5 Cleveland WS. Visualizing Data. AT&T Bell Laboratories; Hobart Press:Murray Hill, NJ, Summit, NJ, 1993; 360pp.

6 Feiner SK, Beshers C. Worlds within worlds: metaphors for exploringn-dimensional virtual worlds. The Third Annual ACM SIGGRAPHSymposium on User Interface Software and Technology 1990 (Snow-bird, UT, USA), ACM Press: New York, NY, 1990.

7 Pirolli P, Rao R. Table lens as a tool for making sense of data. TheWorkshop on Advanced Visual Interfaces 1996 (Gubbio, Italy), ACMPress: New York, NY, 1996; 67–80.

8 Hand D, Mannila H, Smyth P. Principles of Data Mining. MIT Press:Cambridge, MA, 2001.

9 Chambers JM. Graphical Methods for Data Analysis, Vol. xiv. TheWadsworth Statistics/Probability Series, Wadsworth InternationalGroup. Duxbury Press: Belmont, CA, Boston, 1983; 395pp.

10 Chernoff H. The use of faces to represent points in k-dimensionalspace graphically. Journal of American Statistical Association 1973;v68: 361–368.

11 Kandogan E. Visualizing multi-dimensional clusters, trends, andoutliers using Star Coordinates. Proceedings of the Seventh ACMSIGKDD International Conference on Knowledge Discovery and DataMining (KDD-2001) 2001 (San Francisco, CA, USA), Association forComputing Machinery, 2001; 107–116.

12 Bellman R. Adaptive Control Processes. Princeton University Press:Princeton, NJ, 1961.

13 Cox TF, Cox MAA. Multidimensional Scaling, 2nd edn. Monographson Statistics and Applied Probability. Chapman & Hall/CRC: BocaRaton, 2001; 308pp.

14 Inselberg A, Dimsdale B. Parallel coordinates: a tool for visualizingmulti-dimensional geometry. Proceedings of the First IEEE Conference onVisualization. Visualization ’90 (Cat. No. 90CH2914-0). (San Francisco,CA, USA), IEEE Computer Soc. Press: Silver Spring, MD, 1990; 361–378.

15 XmdvTool home page [WWW document] http://davis.wpi.edu/Bxmdv (accessed 27 October 2004).

16 Donoho D, Ramos E. PRIMDATA: data sets for use with PRIM-H.American Statistical Association (ASA) Second Exposition of StatisticalGraphics Technology 1983 (Toronto, Canada), 1983.

17 Artero AO, Oliveira MCFd, Levkowitz H. Uncovering clustersin crowded parallel coordinates visualizations. The IEEE Sympo-sium on Information Visualization 2004 (Austin, TX, USA), 2004;81–88.

18 Peng W, Ward MO, Rundensteiner EA. Clutter reduction in multi-dimensional data visualization using dimension reordering. TheIEEE Symposium on Information Visualization 2004 (Austin, TX, USA),2004; 89–96.

19 Dix A, Finlay J, Abowd G, Beale R. Human–Computer Interaction, 2ndedn. Prentice-Hall: Englewood Cliffs, NJ, 1997; 638pp.

20 Bederson BB, Hollan JD. Pad++: a zooming graphical interfacefor exploring alternate interface physics. Proceedings of theACM Symposium on User Interface Software and Technology1994 (Marina del Rey, CA, USA), ACM: New York, NY, USA,1994; 17.

21 Olsen KA, Korfhage RR, Sochats KM, Spring MB, Williams JG.Visualization of a document collection: The VIBE system. InformationProcessing and Management 1993; 29: 69.

22 Morse EL, Lewis M. Why information retrieval visualizations some-times fail. Proceedings of the 1997 IEEE International Conference onSystems, Man, and Cybernetics. Part 2 (of 5) 1997 (Orlando, FL, USA),IEEE: Piscataway, NJ, USA, 1997; 1680–1683.

23 Shneiderman B. Dynamic queries for visual information seeking. IEEESoftware 1994; 11: 70–77.

24 Bederson BB. A structured 2D graphics framework [WWW document]http://www.cs.umd.edu/hcil/piccolo/index.shtml (accessed 28 August2004).

25 Wehrend SC. A categorization of scientific visualization techniques.M.S. thesis, Department of Computer Science, University ofColorado, Boulder, CO 1990.

26 Roth SF, Mattis J. Data characterization for intelligent graphicspresentation. The SIGCHI Conference on Human Factors in ComputingSystems: Empowering People 1990 (Seattle, WA, USA), ACM Press:New York, NY, USA, 1990; 193–200.

27 Wehrend S, Lewis C. A problem-oriented classification of visualizationtechniques. The First Conference on Visualization 1990 (San Francisco,CA, USA), IEEE: Piscataway, NJ, USA, 1990; 139–143.

Visualization using a magnet metaphor Ji Soo Yi et al

18

Information Visualization