Visualization Techniques for Multivariate Discrete and Continuous Data March 4, 2005 Rachael Brady

  • View
    213

  • Download
    0

Embed Size (px)

Text of Visualization Techniques for Multivariate Discrete and Continuous Data March 4, 2005 Rachael Brady

  • Slide 1

Visualization Techniques for Multivariate Discrete and Continuous Data March 4, 2005 Rachael Brady Slide 2 Multivariate Data Types In general, each point has many attributes and/or measurements Type 1: measurements are continuous in nature, and combining dimensions might make sense Weather data - for each x, y, z location we have water density (scalar), temperature (scalar), wind velocity (vector), air pressure (scalar) Type 2: data is discrete, more like attribute list, and cannot in general be combined Baseball statistics - for each player we have at bats, walks, hits, doubles, homeruns, RBIs. Populations - eye color of residents in NC, income level, voting record Slide 3 Approaches Dimensional Reduction Principle Component Analysis Independent Component Analysis Kohonen Self Organizing Map http://davis.wpi.edu/~matt/courses/soms/Dimensional Subsetting Dimensional Subsetting Dimensional Organization Dimensional Embedding Source: Matt Ward, Multivariate Vis talk Sept 2000 Slide 4 Dimensional Subsetting - Scatter Plots Invoke the concept of small multiples Show all pair- wise dimensions in a matrix Easily see clusters, trends and correlations Problem: How do you see a trend that requires 2 or more dependent variables? Source: Matt Ward, Multivariate Vis talk Sept 2000 Slide 5 Dimensional Organization Show each variable with an explicit visual representation Spatial Shape Color Size Orientation Texture The combination of these visual variables can produce information that pops out, but it is not additive Images: Chris Healey Slide 6 Dimensional Organization - Glyphs (show star glyph demo) Image: Matt Ward, Multivariate Vis talk Sept 2000 Slide 7 Dimensional Organization - Parrallel Coords Parallel Coordinates creates parallel, rather than orthogonal, dimensions. Data point corresponds to polyline across axes Clusters, trends, and anomalies discernable as groupings or outliers, based on intercepts and slopes Source: Matt Ward, Multivariate Vis talk Sept 2000 Show Parrallel Coords Demo Slide 8 Parrallel Coords - Useful? Source: http://www.ccs.neu.edu/home/mattsp/ Slide 9 Parrallel Coords - Useful? Slide 10 Parrallel Coords - Extended Visualizating Hierarchical clusters, Fua et al. 1999 Slide 11 Approaches Dimensional Reduction Principle Component Analysis Independent Component Analysis Kohonen Self Organizing Map http://davis.wpi.edu/~matt/courses/soms/Dimensional Subsetting Dimensional Subsetting Dimensional Organization Dimensional Embedding Source: Matt Ward, Multivariate Vis talk Sept 2000 Slide 12 Dimensional Embedding Dimensional stacking divides data space into bins Each N-D bin has a unique 2-D screen bin Screen space recursively divided based on bin count for each dimension Clusters and trends manifested as repeated patterns Source: Matt Ward, Multivariate Vis talk Sept 2000 Slide 13 Dimensional Embedding - not so easy What Dimensions do you choose at what hierarchy? How do you keep coordinates consistent? How do you layout tiles on page with consistency? Can we do this automatically? Producing a good plot is hard Trellis - an attempt by Rick Becker and Bill Cleveland Incorporated in to the S/S-PLUS statistical Package Slide 14 A Digression into Plot design Slide 15 Effective use of space Which graph is better? Government payrolls in 1931 [how to lie with stats, huff 93] Slide Source: Maneesh Agrawal, Lecutre Notes Fall 2005 Slide 16 Aspect Ratio - fill space with data Yearly CO2 concentrations [Cleveland 85] Dont worry about showing zero Slide Source: Maneesh Agrawal, Lecutre Notes Fall 2005 Slide 17 Banking to 45 Degrees http://cm.bell-labs.com/cm/ms/departments/sia/project/trellis Slide Source: Maneesh Agrawal, Lecutre Notes Fall 2005 Slide 18 Clearly mark scale breaks Slide Source: Maneesh Agrawal, Lecutre Notes Fall 2005 Slide 19 Scale break vs. Log scale Both Increase Visual Resolution Log scale allows easy comparisons of all data Scale break is more difficult to compare across the break Slide Source: Maneesh Agrawal, Lecutre Notes Fall 2005 Slide 20 Transforming Data for Graphing How well does the curve fit the data? Plot vertical distance from best fit curve Residual graph shows accuracy of fit Slide Source: Maneesh Agrawal, Lecutre Notes Fall 2005 Slide 21 A Trellis Example Lead Concentration vs. Setback Distance Given Day-of-the-Week, Week, and Height On the next slide is a trellis display of lead concentration against setback distance given day- of-the-week (thu-wed), week (1-3), and height (3 values). There are 63 panels arranged into 31 columns and 3 rows. Each row conditions on a different value of height; as we go from bottom to top, the heights increase. The panels in each row are in time order because the panels first cycle through the days of the week and then through the weeks. The display reveals much about the structure of the data. There is a strong interaction between height and setback distance. For the lowest height, lead decreases with setback. But for the middle value of height, lead typically first increases with setback and then decreases. For the highest height, lead occasionally has the increase-decrease pattern for about 1/3 of the days, most of them days with large concentrations, and is relatively stable for the remaining days. This behavior is consistent with air transport mechanisms. Lead is emitted at ground level from automobile tail pipes. The closest of the 9 monitors, the one with the lowest height and the closest setback, has the largest concentrations because it is close to the pollution source. From the source, the lead is carried laterally by the wind, spreading upward as it moves. This plume-like behavior can cause the concentrations to be relatively small at the higher monitors at the closest setback. Source: http://cm.bell-labs.com/cm/ms/departments/sia/project/trellis/wwww.html Slide 22 A Trellis Example Source: http://cm.bell-labs.com/cm/ms/departments/sia/project/trellis/wwww.html Slide 23 Tensor Visualization High Dimensional Scientific Data Visualization Not Today Slide 24 Some Interesting Web Sites The best and worst of statistical graphs http://www.math.yorku.ca/SCS/Gallery/http://www.math.yorku.ca/SCS/Gallery/ Chris Healeys Preattentive Vision Applet http://www.csc.ncsu.edu/faculty/healey/PP/index.html#Preattentive OpenDX Gallery http://www.opendx.org/highlights.php IVTK: An Information Visualization Toolkit Ivtk.sourceforge.net Information Visualization Repository http://www.cs.umd.edu/hcil/InfovisRepository/index.shtml Slide 25 Resources Great sources for theory behind multivariate display and perception are Bertin 1983 Cleveland 1993 Tufte 1983, 1990 Colin Ware, 2000 A couple of good papers are Shneiderman, The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations Marc Green, Toward a Perceptual Science of Multidimensional Data Visualization: Bertin and Beyond