View
224
Download
0
Tags:
Embed Size (px)
Citation preview
UC Berkeley, 09/19/00
An Introduction to Multivariate Data Visualization and XmdvTool
Matthew O. WardComputer Science DepartmentWorcester Polytechnic Institute
This work was supported under NSF Grant IIS-9732897
UC Berkeley, 09/19/00
What is Multivariate Data?
Each data point has N variables or observations
Each observation can be: nominal or ordinal discrete or continuous scalar, vector, or tensor
May or may not have spatial, temporal, or other connectivity attribute
UC Berkeley, 09/19/00
Characteristics of a Variable
Order: grades have an order, brand names do not. Distance metric: for income, distance equals
difference. For rankings, difference is not a distance metric.
Absolute zero: temperature has an absolute zero, bank account balances do not.
A variable can be classified by these three attributes, called Scale.
Effective visualizations attempt to match the scale of the data dimension with the graphical attribute conveying it.
UC Berkeley, 09/19/00
Sources of Multivariate Data
Sensors (e.g., images, gauges)SimulationsCensus or other surveysCommerce (e.g., stock market)Communication systemsSpreadsheets and databases
UC Berkeley, 09/19/00
Issues in Visualizing Multivariate Data
How many variables? How many records? Types of variables? User task (exploration, confirmation,
presentation) Data feature of interest (clusters, anomalies,
trends, patterns, ….) Background of user (domain expert, visualization
specialist, decision-maker, ….)
UC Berkeley, 09/19/00
Methods for Visualizing Multivariate Data
Dimensional SubsettingDimensional ReorganizationDimensional EmbeddingDimensional Reduction
UC Berkeley, 09/19/00
Dimensional Subsetting
Scatterplot matrix displays all pairwise plots
Selection allows linkage between views
Clusters, trends, and correlations readily discerned between pairs of dimensions
UC Berkeley, 09/19/00
Dimensional Reorganization
Parallel Coordinates creates parallel, rather than orthogonal, dimensions.
Data point corresponds to polyline across axes
Clusters, trends, and anomalies discernable as groupings or outliers, based on intercepts and slopes
UC Berkeley, 09/19/00
Dimensional Reorganization (2)
Glyphs map data dimensions to graphical attributes
Size, color, shape, and orientation are commonly used
Similarities/differences in features give insights into relations
UC Berkeley, 09/19/00
Dimensional Embedding
Dimensional stacking divides data space into bins
Each N-D bin has a unique 2-D screen bin
Screen space recursively divided based on bin count for each dimension
Clusters and trends manifested as repeated patterns
UC Berkeley, 09/19/00
Dimensional Reduction
Map N-D locations to M-D display space while best preserving N-D relations
Approaches include MDS, PCA, and Kohonen Self Organizing Maps
Relationships conveyed by position, links, color, shape, size, etc.
UC Berkeley, 09/19/00
The Role of Selection
User needs to interact with display, examine interesting patterns or anomalies, validate hypotheses
Selection allows isolation of subset of data for highlighting, deleting, focussed analysis
Direct (clicking on displayed items ) vs. indirect (range sliders)
Screen space (2-D) vs. data space (N-D)
UC Berkeley, 09/19/00
Problems with Large Data Sets
Most techniques are effective with small to moderate sized data sets
Large sets (> 50K records) are increasingly common
When traditional visualizations used, occlusion and clutter make interpretation difficult
UC Berkeley, 09/19/00
One Potential Solution
Multiresolution displays with aggregation Explicit clustering
Break dimensions into bins Aggregate in a particular order (datacubes)
Implicit clustering Hierarchical clustering (proximity-based merging) Hierarchical partitioning (proximity-based splits)
Problem: many ways to cluster, each revealing different aspects of data
UC Berkeley, 09/19/00
Display Options
For each cluster, show Center Extents for each dimension Population Other descriptors (e.g., quartiles)
Color clusters such that siblings have similar color to parents
UC Berkeley, 09/19/00
Hierarchical Parallel Coordinates
Bands show cluster extents in each dimension
Opacity conveys cluster population
Color similarity indicates proximity in hierarchy
UC Berkeley, 09/19/00
Hierarchical Scatterplots
Clusters displayed as rectangles, showing extents in 2 dimensions
Color/opacity consistently used for relational and population info
UC Berkeley, 09/19/00
Hierarchical Glyphs
Star glyph with bands
Analogous to parallel coordinates, with radial rather than parallel dimensions
Glyph position critical for conveying relational info
UC Berkeley, 09/19/00
Hierarchical Dimensional Stacking
Clusters occupy multiple bins
Overlaps can be reduced by increasing number of bins
Cell colors can be blended, or display last cluster mapped to space
UC Berkeley, 09/19/00
Hierarchical Star Fields
Dimensional reduction techniques commonly displayed with starfields
Each cluster becomes circle/sphere in field
Alternatively, can show glyph at cluster location
UC Berkeley, 09/19/00
Navigating Hierarchies
Drill-down, roll-up operations for more or less detail
Need selection operation to identify subtrees for exploration, pruning
Need indications of where you are in hierarchy, and where you’ve been during exploration process
UC Berkeley, 09/19/00
Structure-Based Brushing
Enhancement to screen-based and data-based methods
Specify focus, extents, and level of detailIntuitive - wedge of tree and depth of
interestImplemented by labeling/numbering
terminals and propagating ranges to parents
UC Berkeley, 09/19/00
Structure-Based Brush
White contour links terminal nodes
Red wedge is extents selection
Color curve is depth specification
Color bar maps location in tree to unique color
Direct and indirect manipulation of brush
UC Berkeley, 09/19/00
Auxiliary Tools
Extent scaling to reduce occlusion of bandsDimensional zooming - fill display with
selected subspace (N-D distortion)Dynamic masking to fade out selected or
unselected dataSaving selected subsetsEnabling/disabling dimensionsUnivariate displays (Tukey box plots, tree
maps)
UC Berkeley, 09/19/00
Summary
Many ways to map multivariate data to images, each with strengths and weaknesses
Linking between and within displays with brushing enhances static displays and combines their strengths.
Hierarchies and aggregations allow visualization of large data sets
Intuitive navigation, filtering, and focus critical to exploration process
Each basic multivariate visualization method is readily extensible to displaying cluster information
UC Berkeley, 09/19/00
Problems and Future Work
Many data characteristics not currently supported (e.g., text fields, records with missing entries, data quality)
Navigation tool for hierarchies assumes linear order of nodes Looking at tools for dynamic reorganization Exploring 2-D or higher navigation interfaces
Very large hierarchies are difficult to focus on a narrow subset Developing multiresolution interface Investigating distortion techniques for navigation
UC Berkeley, 09/19/00
Other Future Work
Projection pursuit or view recommender
Linking structure and data brushingUser studiesCustomization based on domainQuery optimization via caching and
prefetching
UC Berkeley, 09/19/00
For More Information
XmdvTool available to the public domainBoth Unix and Windows supportNext release will have Oracle interface,
with query optimization to support exploratory operations
http://davis.wpi.edu/~xmdvPapers in Vis ‘94, ‘95, ‘99, Infovis ‘99,
IEEE TVCG Vol. 6, No. 2, 2000