View
216
Download
0
Embed Size (px)
Citation preview
Multidimensional Data Analysis
IS 247 Information Visualization and Presentation
22 February 2002
James ReffellMoryma Aydelott
Jean-Anne Fitzpatrick
Problem Statement
• How to effectively present more than 3 dimensions of information in a visual display with 2 (to 3) dimensions?
• How to effectively visualize “inherently abstract” data?
• How to effectively visualize very large, often complex data sets?
• How to effectively display results – when you don’t know what those results will be?
Key Goals
• More than 3 dimensions of data simultaneously
• Support “fuzzyness” (similarity queries, vector space, tolerance ranges)
• Support exploratory, opportunistic, “what-if” queries
• Allow identification of interesting data properties through pattern recognition
• Explore various dimensions without losing overview
Another Statement of Goals
Visualization of multidimensional data• Without loss of information• With:
– Minimal complexity– Any number of dimensions– Variables treated uniformly– Objects remain recognizable across
transformations– Easy / intuitive conveyance of information– Mathematically / algorithmically rigorous
(Adapted from Inselberg)
Purposes / Uses
• Find clusters of similar data• Find “hot spots” (exceptional items in
otherwise homogeneous regions)• Show relationships between multiple
variables• Similarity retrieval rather than boolean
matching, show near misses
“Searching for patterns in the big picture and fluidly investigating interesting details without losing framing context” (Rao & Card)
Characteristics• “Data-dense displays” (large number of
dimensions and/or values)– Often combine color with position / proximity
representing relevance “distance” – Often provide multiple views
• Build on concepts from previous weeks:– Retinal properties of marks– Gestalt concepts, e.g., grouping– Direct manipulation / interactive queries– Incremental construction of queries– Dynamic feedback
• Some require specialized input devices or unique gesture vocabulary
Influence Explorer / Prosection Matrix
(Tweedie et. al.)
• We saw the video!• Abstract one-way mathematical
models: multiple parameters, multiple variables.
• Data for visualization comes from sampling
• Visualization of non-obvious underlying structures in models
• Color coding, attention to near misses
Influence Explorer / Prosection Matrix
(Tweedie et. al.)
• Use the sliders to set performance limits.• Color coding gives immediate feedback as
to effects of changes—both for ‘perfect’ scores and for near-misses.
• Can also highlight individual values across histograms, show parallel coordinates.
• Interactive querying!
Influence Explorer / Prosection Matrix
(Tweedie et. al.)
• In this view we can shift parameter ranges in addition to performance limits.
• Red is still a perfect score—blacks miss one parameter limit, blues one or two performance limits.
• Does this color scheme make sense? Would another work better?
Influence Explorer / Prosection Matrix
(Tweedie et. al.)
• Prosection matrix (on right) = scatter plots for pairs of parameters.
• Color coding matches histograms.• Fitting tolerance region (yellow box) to
acceptability (red region) gives high yield for minimum cost
• Or: Make the red bit as big as possible!• This aspect closely tuned to task at hand:
manufacturing and similar.
The Table Lens (Rao and Card)
• Tools: zoom, adjust, slide• Works best for case /
variable data• Cell contents coded by
color (nominal) or bar length (interval)
• Special mouse gesture vocabulary
• Search / browse (spotlighting)
• Create groups by dragging columns
The Table Lens (Rao and Card)
• Focus + context for large datasets while retaining access to all data
• Flexible, suitable for many domains
• Good example of direct interaction
• Inxight = silly name
Parallel Coordinates(Inselberg)
• Translation of multiple graphs by using parallel axes.
• Useful for recognizing patterns between the axes - adding or removing parts of the data to see general patterns or more closely examine particular interactions.
• Articles offer suggestions on how to most effectively use this system.
Parallel Coordinates(Inselberg)
Dataset in a Cartesian graph Same dataset in parallel coordinates
Parallel Coordinates applet - http://csgrad.cs.vt.edu/~agoel/parallel_coordinates/
Like a normal graph, but different…
Parallel Coordinates(Inselberg)
Strengths – • Works for any N• Clearly displays data characteristics of the data (without
needing beaucoup explanations)• Easy to adjust or focus displays/ queries • Testing showed that it showed problems missed using other
forms of process control• Can be used in decision support when used as a visual
modeling tool (to see how adjusting one parameter effects others).
Weaknesses – • Formation of complex queries can be tricky (if you want to
get results that are useful and easy to interpret).
Polaris(Stolte and Hanharan)
• Extends pivot tables to generate graphic (not just table) displays
• Multiple graphs on one screen• Designed to “combine statistical analysis
and visualization”
Polaris(Stolte and Hanharan)
Four step process: from selection to partitioning to grouping/ aggregation to composing/rendering/displaying
Polaris(Stolte and Hanharan)
• Table algebra automatically generated via drag and drop.
• Graphics generated using this algebra. Suitable graphic types are system selected based on query/result data types, combinations. (Include tables, bar charts, dot plot, gantt charts, matrices of scatterplots, maps.)
• Users can select marks (marks differ by shape, size, orientation and color).
Polaris(Stolte and Hanharan)
Thought behind display types and graphs choices (Shapes recommended by Cleveland, Use of Size and orientation as recommended by Kosslyn, Color as recommended by Travis)
No mention of user testing, though.
Polaris(Stolte and Hanharan)
•Data mapped into “layers”
•Linking and brushing capabilities, combined with automatic determination of the “best” graph type allows easy drilling down.
Polaris(Stolte and Hanharan)
Strengths – • Can be used with existing DB systems• Data transformations can be converted to
SQL• Direct manipulation - Linking and Brushing,
drag and drop supported• Users can play with appearance of display• Does maps, charts, images – not limited to
one display type.
Weaknesses – • User only sees aggregated (not original)
data
• System performs a number of functions automatically (conversion of variables, aggregation) - user may not know or not be able to control how their data is changed.
Worlds Within Worlds(Fiener and Beshers)
• Basic approach: graph 3 dimensions, while holding “extra” dimensions constant
• Visually represent “extra” dimensions as space within which graph(s) are placed– Position of “inner world” graph axis zero point
equals set of constant values in “outer world”
• Tools:– Dipstick– Waterline– Magnifying box
The following images from: http://www-courses.cs.uiuc.edu/~cs419/multidim.ppt
Worlds Within Worlds
• Constraints:– Uses special input device (“Data Glove”)
and output device (liquid crystal stereo glasses); use without these special devices less than optimal
• Technical details:– Suspend calculation of “child” details during
movement– Algorithm for prioritizing overlapping objects– Need to “turn off” gesture recognition to
allow normal use of hand
Techniques for plotting multivariate functions
(Mihalisin et al)
• Multiples showing component dimensions, color codes for dimensions applied across multiples
• Or, for categorical data, select mth category from nth dimension
• Or, plot nested boxes, step values of independent variables and color-coding dependent variable
Techniques for plotting multivariate functions
(Mihalisin et al)
• Tools:– General zoom: look at smaller range
of data in same amount of space– Subspace zoom: select view of
particular dimension’s input to function
– Decimate tool: sample fewer values within range
VisDB(Keim & Kriegel)
• Mapping entries from relational database to pixels on the screen
• Include “approximate” answers, with placement and color-coding based on relevance
• Data points laid out in:– Rectangular spiral– Or, with axes representing positive/negative
values for two selected dimensions– Or, group dimensions together (easier to
interpret than very large number of dimensions)
VisDB - Relevance
• Relevance calculation based on “distance” of each variable from query specification
• Distance calculation depends on data type– Numeric: mathematical – String: character/substring matching, lexical,
phonetic?, syntactic?– Nominal: predefined distance matrix– Possibly other “domain-specific” distance
metrics
VisDB – Screen Resolution
• Stated screen resolution seems reasonable by today’s standards:19 inch display, 1024x1280 pixels= 1.3 million data points
• However, controls take up a lot of space!
VisDB – Implementation
• Requires features not available in commercial databases:– Partial query results– Incremental changes to queries– Speed? (1994 vs today)
Limitations and Issues
• Complexity• Abstract data
– These visualizations are oriented toward abstract data
– For “naturally” two or three-dimensional data (things that vary over time or space, e.g., geographic data) visualizations which exploit those properties may exist and be more effective
User Testing?
• Many of these systems seem only appropriate for expert use
• Minimal evidence of user testing in most cases
Future Work
• Save query parameters for reference / sharing results
• Automated query generation or filtering – Intelligent agents?