CP SC 8810 Data Visualization - Clemson Universitylevinej/courses/F14/... · Tree and Graph Vis Summary • Trees: • Indentation: simple, effective for small trees • Node-link

CP SC 8810 Data Visualization

Joshua Levine [email protected]

mailto:[email protected]

Lecture 15 Text and Sets

Oct. 14, 2014

Agenda

• Lab 02 Grades!

• Lab 03 due in 1 week

Lab 2 Summary

Preferences on x-axis label separation

0

3

6

9

12

5 7 10 20

10 could simply indicate laziness…

Preferences on Rotated vs. Normal Text for the y-axis label

0

2

4

6

8

Rotated Normal Both!

Choice for the “Best” Summary View (Note: These OFTEN did not agree with

the answer to Task 2)

0

2.25

4.5

6.75

9

Dots Connected Dots Lines Area Multiple User Selects

Task 2 Answers for “Best”

0

1

2

3

4

Dots Connected Dots Lines Area Not Specified

“Not Selected” means argued for multiple or forgot to answer

Grade Distribution: Mean 8.84

StdDev 0.93

Great Job Overall!

How Did They Look?

Good Colors Effective Legend

Good Colors Interactive Legend

Nice Legend Interactive Mouseover

Different x-axis separation

Interactive Measuring Bar Less Saturated Colors

Effective TransparencyInteractive Legend

Informative Mouseover

Different Aspect Ratio Reinforced Visual Encodings

Task 1: Minimal Ink Histogram of y- values

Combined Line Plots with Area Plots Interactive Legend + Mode Selection

Both Normal and Rotated Text for y-axis

Last Time: Trees and Graphs

Design Choices for Trees and Graphs

NODE LINK LAYOUTS

+ understandable visual mapping

+ can show overall structure, clusters, paths

+ flexible, many variations

- all but the most trivial algorithms are > O(N2)

- not good for dense graphs- hairball problem!

Drawing graphs – node link

Node-Link Layouts• Strengths:

• Understandable visual mapping • Can show overall structure, clusters, paths • Flexible, many variations

• Limitations: • All but the most trivial algorithms are >

O(N2) • Not good for dense graphs: Hairball

problem! • Small changes in the graph can cause

dramatic changes to the layout • See Frishman and Tal. Online Dynamic

Graph Drawing. Proc EuroVis 2007

Matrix Layouts• Instead of node-link diagrams, use the adjacency

matrix to represent

A

CB

D E

A

B

C

D

E

A B C D E

MATRIX REPRESENTATIONS

+ great for dense graphs

+ visually scalable

+ can spot clusters

- abstract visualization

- hard to follow paths

Matrix Representations• Strengths:

• Great for dense graphs

• Visually scalable

• Can spot clusters

• Limitations

• Abstract visualization

• Hard to follow paths

Spotting Patterns in Matrices

Henry 2006

Adjacency Matrices

Attribute-Driven Layout

Paul Butler

http://www.facebook.com/notes/facebook-engineering/visualizing-friendships/469716398919

Attribute-Driven Layout• Large node-link diagrams get messy!

• Are there additional structures we can exploit?

• Idea: use data attributes to perform layout

• e.g., scatterplot based on node values

• Dynamic queries and/or brushing can be used to enhance perception of connectivity

Cerebral

Barsky 2008

Visual Exploration of Multivariate Graphs Martin Wattenberg

Visual Communication Lab, IBM Research 1 Rogers St., Cambridge MA 02142

[email protected]

Figure 1. A PivotGraph visualization of a large graph rolled up onto two categorical dimensions ABSTRACT This paper introduces PivotGraph, a software tool that uses a new technique for visualizing and analyzing graph structures. The technique is designed specifically for graphs that are “multivariate,” i.e., where each node is associated with several attributes. Unlike visualizations which emphasize global graph topology, PivotGraph uses a simple grid-based approach to focus on the relationship between node attributes and connections. The interaction technique is derived from an analogy with methods seen in spreadsheet pivot tables and in online analytical processing (OLAP). Finally, several examples are presented in which PivotGraph was applied to real-world data sets.

Author Keywords information visualization, graph drawing

ACM Classification Keywords H.5.2. [Information Interfaces]: User Interfaces. I.3.6 [Methodology and Techniques]: Interaction Techniques

INTRODUCTION A common goal in exploratory data analysis is to form hypotheses about a graph. In contexts ranging from social networks to Markov chains, it can be important to understand the properties of a graph and make inferences about why it looks the way it does. Because of the complex structure and large size of many graphs, such an analysis can be difficult.

Visualization is one solution to this problem. In one popular approach, nodes of a graph are represented by a dot, and edges by lines or curves linking them. Such “node-and-link diagrams” have been explored intensively throughout this century (see [3, 10, 14, 15]). A second common approach is a matrix view, or density table, that displays a graph’s adjacency matrix overlaid on a grid [4].

Both node-and-link diagrams and matrix views emphasize link structure. In many cases, however, data about a graph

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise,or republish, to post on servers or to redistribute to lists, requires priorspecific permission and/or a fee.CHI 2006, April 22-28, 2006, Montréal, Québec, Canada.Copyright 2006 ACM 1-59593-178-3/06/0004...$5.00.

811

CHI 2006 Proceedings • Visualization 1 April 22-27, 2006 • Montréal, Québec, Canada

Pivot Graph• Task abstraction

• Show relationship between node attributes and connections in a multi-attribute graph

• Data abstraction

• Relational dataset

• Nodes (and edges) have multiple discrete attributes

• Rollup and selection transformations

Visual Encoding• Line (1D) or grid (2D) layout

• Area subdivided by number of values for an attribute

• Number of nodes based on attribute count, not original graph node count

• Size of nodes and edges related to number of aggregated original nodes and edges

• Scalability through abstraction, not layout algorithm

Visual Encoding• Line for 1D rollup, or grid for 2D case

Wattenberg 2006

Interaction• Changing rollup/selection choices

• Animated transitions between states

PIVOT GRAPH-in general, more compact than matrix representation

81

Pivot Graph• In general, more compact than matrix

representation

Critique: What Do You Think?

Tree and Graph Vis Summary

• Trees: • Indentation: simple, effective for small trees • Node-link and layered: look good but needs exponential space • Enclosure (treemaps): great for size related tasks but suffer in

structure related tasks

• Graphs • Node-link: familiar, but problematic for dense graphs • Adjacency matrices: abstract, hard to follow paths • Attribute-driven: not always possible

Takeaway: No Best Solution …or Graph visualization is still a great research area!

Text

Text Data

• No Numbers (implicitly)

• Characters (ASCII)

• Strings

Text Data

• Words

• Sentences • Paragraphs • Chapters

• Lines

I

love

visualization.

Text Data

• Words


• Lines

I

love

visualization.

I love visualization.

Text Data

• Words


• Lines

I

love

visualization.

I love visualization.

Text Data

http://galaxyonfire.wikia.com/wiki/File:Small-ascii-art.jpg

http://galaxyonfire.wikia.com/wiki/File:Small-ascii-art.jpg

Text Data• Documents

• Books • Papers • Webpages • Emails • Twitter posts

!

• Corpus: collection of documents

Text Data• Documents

• Books • Papers • Webpages • Emails • Twitter posts

!

• Corpus: collection of documents

Text Visualization For Documents

Tag Clouds / Word Clouds

http://www.tagcrowd.com

http://www.wordle.com

http://www.tagcrowd.com

http://www.wordle.com

http://manyeyes.alphaworks.ibm.com/manyeyes/visualizations/89ade5ae1a0122db011a0eb3ee7309fb/comments/ae0a4e14ad9e11ddb3a3000255111976

Text Arc

Wattenberg, Viegas 2008

DocuBurst

Collins, Carpendale, Penn 2008

Arc Diagrams

Analysis of the Characters from Les Misérables: http://mbostock.github.io/protovis/ex/arc.html

http://mbostock.github.io/protovis/ex/arc.html

Rule-Based: Poetry

Abdul-Rahman et al. 2008

Text Visualization For Document Collections

https://www.youtube.com/watch?v=rL3Ga6xBgLw

Document Cards (small multiples)

Showing Temporal Relationships: ThemeRiver (Stream Graph)

Havre, Hetzler, Nowell 2000

Jigsaw: Many Linked Views

Stasko et al. 2008

Jigsaw: Many Linked Views

Stasko et al. 2008

Lec16 Required Reading

ii

ii

ii

ii

Arrange Spatial Data

Chapter 8

8.1 The Big Picture

For datasets with spatial semantics, the usual choice for arrangeis to use the given spatial information to guide the layout. In thiscase, the choices of express, separate, order, and align do not applybecause the position channel is not available for directly encodingattributes. The two main spatial data types are geometry, whereshape information is directly conveyed by spatial elements thatdo not necessarily have associated attributes, and spatial fields,where attributes are associated with each cell in the field. (See Fig-ure 8.1.) For scalar fields with one attribute at each field cell, thetwo main visual encoding idiom families are isocontours and directvolume rendering. For both vector and tensor fields, with multipleattributes at each cell, there are four families of encoding idioms:flow glyphs that show local information, geometric approaches thatcompute derived geometry from a sparse set of seed points, textureapproaches that use a dense set of seeds, and feature approacheswhere data is derived with global computations using informationfrom the entire spatial field.

8.2 Why Use Given?

The common case with spatial data is that the given spatial po-sition is the attribute of primary importance because the centraltasks revolve around understanding spatial relationships. In thesecases, the right visual encoding choice is to use the provided spa-tial position as the substrate for the visual layout, rather than tovisually encode other attributes with marks using the spatial posi-tion channel. This choice may seem obvious from common sensealone. It also follows from the effectiveness principle, since the

181

8.1 - 8.3 ONLY!!