Information Visualization - SJTU€¦ ·  · 2013-11-04perception and visual information...

Preview:

Citation preview

Information Visualization

outline

• Visual Perception • High-dimensional Data Visualization • Hierarchical(tree) Data Visualization • Graphs and Networks Visualization • Time Series Data Visualization • Text and Document Visualization • Geographical Data Visualization

outline

• Visual Perception • High-dimensional Data Visualization • Hierarchical(tree) Data Visualization • Graphs and Networks Visualization • Time Series Data Visualization • Text and Document Visualization • Geographical Data Visualization

Visual Perception • Seek to better understand visual

perception and visual information processing – Multiple theories or models exist

– Need to understand physiology and cognitive psychology

One (simple) Model • Two stage process

– Parallel extraction of low-level properties of scene

– Sequential goal-directed processing

Stage 1 - Low-level, Parallel • Neurons in eye & brain responsible for different

kinds of information – Orientation, color, texture, movement, etc.

• Arrays of neurons work in parallel

• Occurs “automatically”

• Rapid

• Information is transitory, briefly held in iconic store

• Bottom-up data-driven model of processing

• Often called “pre-attentive” processing

Stage 2 - Sequential, Goal-Directed

• Splits into subsystems for object recognition and for interacting with environment

• Increasing evidence supports independence of systems for symbolic object manipulation and for locomotion & action

• First subsystem then interfaces to verbal linguistic portion of brain, second interfaces to motor systems that control muscle movements

Stage 2 Attributes • Slow serial processing

• Involves working and long-term memory

• More emphasis on arbitrary aspects of symbols

• Top-down processing

Preattentive Processing • How does human visual system analyze

images? – Some things seem to be done preattentively,

without the need for focused attention

– Generally less than 200-250 ms (eye movements take 200 ms)

– Seems to be done in parallel by low-level vision system

How Many 3’s? 1281768756138976546984506985604982826762

9809858458224509856458945098450980943585

9091030209905959595772564675050678904567

8845789809821677654876364908560912949686

How Many 3’s? 1281768756138976546984506985604982826762

9809858458224509856458945098450980943585

9091030209905959595772564675050678904567

8845789809821677654876364908560912949686

What Kinds of Tasks? • Target detection

– Is something there?

• Boundary detection – Can the elements be grouped?

• Counting – How many elements of a certain type are

present?

Example

• Determine if a blue circle is present

Hue

• Can be done rapidly (preattentively) by people

• Surrounding objects called “distractors”

Shape

• Can be done preattentively by people

Hue and Shape

• Cannot be done preattentively

• Must perform a sequential search

• Conjuction of features (shape and hue) causes it

Example

• Is there a boundary in the display?

Hue versus Shape

• Left: Boundary detected preattentively based on hue regardless of shape

• Right: Cannot do mixed color shapes preattentively

Gestalt Laws • Background

– German psychologists, early 1900’s

– Attempt to understand pattern perception

– Founded Gestalt school of psychology

– Provided clear descriptions of many basic perceptual phenomena • Gestalt Laws of Pattern Perception

Gestalt Laws • Proximity

– Things close together are perceptually grouped together

• Similarity – Similar elements get grouped

together

Gestalt Laws • Figure & Ground

– Figure is foreground, ground is behind

• Continuity – More likely to construct

visual entities out of smooth, continuous visual elements

Gestalt Laws • Symmetry

– Symmetrical patterns are perceived more as a whole

• Closure – A closed contour is seen as an

object

Visual Encoding • Marks: geometric primitives

– Points

– Lines

– Areas

• Visual channels: control appearance of marks – Position

– Color

– Size

– …

Variables of the Image

Visual Channel • Visual Channel Types and Rankings

Visual Channel Types and Rankings

containment

similarity proximity

connection

relational grouping

position

hue

pattern

shape

categorical

what/where

position

length

angle

area

lightness/saturation

Stipple density

ordered

how much

more accurate

Visual Channel Types and Rankings

outline

• Visual Perception • High-dimensional Data Visualization • Hierarchical(tree) Data Visualization • Graphs and Networks Visualization • Time Series Data Visualization • Text and Document Visualization • Geographical Data Visualization

How Many Variables? • Data sets of dimensions 1, 2, 3 are

common

• Number of variables per class – 1 - Univariate data

– 2 - Bivariate data

– 3 - Trivariate data

– >3 - Hypervariate data

Representations • Some standard ways for low-dimensional

data

More Dimensions • Fundamentally, we have 2 geometric

(position) display dimensions

• For data sets with >2 variables, we must project data down to 2D

• Come up with visual mapping that locates each dimension into 2D plane

• Computer graphics: 3D->2D projections

Scatterplot Matrix • Represent each

possible pair of variables in their own 2-D scatterplot

• Generalized Sensitivity Scatterplot

Scatterplot

Parallel Coordinates • Encode variables along

a horizontal row

• Vertical line specifies different values that variable can take

• Data point represented as a polyline

Parallel Coords Example

Issue • Different variables can have values taking

on quite different ranges

• Must normalize all down (e.g., 0->1)

Challenges • Too much data

Dimensional Reordering • Which dimensions are most like each

other?

Same dimensions ordered according to similarity

Reducing Density

Star Plots • Space out the n variables

at equal angles around a circle

• Each “spoke” encodes a variable’s value

• Data point is now a “shape”

Star Plots

Star Coordinates

Dust & Magnet • Altogether different metaphor

• Data cases represented as small bits of iron dust

• Different attributes given physical manifestation as magnets

• Interact with objects to explore data

Interface

Interaction • Iron bits (data) are drawn toward magnets

(attributes) proportional to that data element’s value in that attribute – Higher values attracted more strongly

• All magnets present on display affect position of all dust

• Individual power of magnets can be changed

• Dust’s color and size can connected to attributes as well

Interaction • Moving a magnet makes all the dust move

– Also command for shaking dust

• Different strategies for how to position magnets in order to explore the data

Dust & Magnet

Dense Pixel Display • Represent data case or a variable as a pixel

• Million or more per display

• Rely on use of color

• Value ranges are mapped to a fixed color sequence of full color (hue)scale but monotonically decreasing brightness

• Data values belonging to one attribute are displayed in a separate view ‐ only one pixel per data value without need for a border

One Representation • Grouping arrangement

• One pixel per variable

• Each data case has its own small rectangular icon

• Plot out variables for data point in that icon using a grid or spiral layout

Illustration

Pixel Bar Chart • Make each pixel within a bar correspond

to a data point in that group represented by the bar

• Can do millions that way

• Color the pixel to represent the value of one of the data point’s variables

Pixel Bar Chart

Product type is x-axis divider

Customers ordered by

y-axis: dollar amount

x-axis: number of visits

Color is (a) dollar amount spent, (b) number of visits, (c) sales quantity

Stacked Graph

Small Multiples

Summary • We’ve seen many general techniques for

multivariate – Scatterplot matrix

– Parallel coordinates

– Dense Pixel Display

– Stacked Graph

– …

• Know strengths and limitations of each

• Know which ones are good for which circumstances

outline

• Visual Perception • High-dimensional Data Visualization • Hierarchical(tree) Data Visualization • Graphs and Networks Visualization • Time Series Data Visualization • Text and Document Visualization • Geographical Data Visualization

Hierarchies • Definition

– Data repository in which cases are related to subcases

– Can be thought of as imposing an ordering in which cases are parents or ancestors of other cases

• Pervasive in the world – Family histories, ancestries

– File/directory systems on computers

– Organization charts

– Object-oriented software classes

Trees • Hierarchies often represented as trees

– Directed, acyclic graph

• Two main representation schemes – Node-link

– Space-filling

Spatial Layout • Primary concern of graph drawing is the

spatial layout of nodes and edges

• Often (but not always) the goal is to effectively depict the graph structure – connectivity, path-following

– network distance

– clustering

– ordering (e.g., hierarchy level)

Indentation • place all items along

vertically spaced rows

• indentation used to show parent/child relationships

• commonly used as a component in an interface

• breadth and depth contend for space

• often requires a great deal of scrolling

Node-Link Diagrams • nodes are distributed in space, connected

by straight or curved lines

• typical approach is to use 2D space to break apart breadth and depth

• often space is used to communicate hierarchical orientation

Node-Link Diagrams • Root at top, leaves at bottom is very common

Node-Link Diagrams • Root can be at center with levels growing

outward too

Basic Algorithm – Recursive algorithm

– Height on separate levels

– Width in unique columns

– Make room for subtrees upwards

Reingold-Tilford Algorithm • goal

– make smarter use of space

– maximize density and symmetry

• design concerns – clearly encode depth level

– no edge crossings

– isomorphic subtrees drawn identically

– compact

• approach – bottom up recursive approach

– for each parent make sure every subtree is drawn

– pack subtrees as closely as possible

– center parent over subtrees

Radial Layout • node-link diagram in

polar coordinates

• radius encodes depth with root in center

• angular sectors assigned to subtrees

• Reingold-Tilford can be applied

Potential Problems • For top-down, width of fan-out uses up

horizontal real estate very quickly – At level n, there are 2n nodes

• Tree might grow a lot along one particular branch – Hard to draw it well in view without knowing

how it will branch

InfoVis Solutions • Techniques developed in Information

Visualization largely try to assist the problems

• Alternatively, Information Visualization techniques attempt to show more attributes of data cases in hierarchy or focus on particular applications of trees

SpaceTree • Uses conventional 2D layout techniques with

some clever additions

Grosjean, Plaisant, Bederson InfoVis ‘02

Characteristics • Vertical or horizontal

• Subtrees are triangles – Size indicates depth

– Shading indicates number of nodes inside

• Navigate by clicking on nodes – Strongly restrict zooming

SpaceTree

3D Approaches • Add a third dimension into which layout can

go

• Compromise of top-down and centered techniques mentioned earlier

• Children of a node are laid out in a cylinder “below” the parent – Siblings live in one of the 2D planes

Cone Trees • Pros

– More effective area to lay out tree

– Use of smooth animation to help person track updates

– Aesthetically pleasing

• Cons – As in all 3D, occlusion

obscures some nodes

– Non-trivial to implement and requires some graphics horsepower

Alternative Solutions • Change the geometry

• Apply a hyperbolic transformation to the space

• Root is at center, subordinates around

• Apply idea recursively, distance decreases between parent and child as you move farther from center, children go in wedge rather than circle

Hyperbolic Browser • Focus + Context Technique

– Detailed view blended with a global view

• First lay out the hierarchy on the hyperbolic plane

• Then map this plane to a disk

• Start with the tree’s root at the center

• Use animation to navigate along this representation of the plane

2D Hyperbolic Browser • Approach: Lay out the hierarchy on the

hyperbolic plane and map this plane onto a display region.

• Comparison – A standard 2D browser: 100 nodes (w/3 character

text strings)

– Hyperbolic browser: 1000 nodes, about 50 nearest the focus can show from 3 to dozens of characters

Hyperbolic Browser

Key Attributes • Natural magnification (fisheye) in center

• Layout depends only on 2-3 generations from current node

• Smooth animation for change in focus

• Don’t draw objects when far enough from root (simplify rendering)

Problems • Orientation

– Watching the view can be disorienting

– When a node is moved, its children don’t keep their relative orientation to it as in Euclidean plane, they rotate

– Not as symmetric and regular as Euclidean techniques, two important attributes in aesthetics

Space-Filling • Each item occupies an area

• Children are “contained” under parent

Treemap • Space-filling representation developed by

Shneiderman and Johnson, Vis ‘91

• Children are drawn inside their parent

• Alternate horizontal and vertical slicing at each successive level

• Use area to encode other variable of data items

Treemap • Example

Applications • Can use Treemap idea for a variety of domains

– File/directory structures

– Basketball statistics

– Software diagrams

– …

Treemap Affordances • Good representation of two attributes beyond

node-link: color and area

• Not as good at representing structure – What happens if it’s a perfectly balanced tree of

items all the same size?

– Also can get long-thin aspect ratios

– Borders help on smaller trees, but take up too much area on large, deep ones

Treemap Variations • Cluster Treemap

– Compromises treemap algorithm to avoid bad aspect ratios

– Basic algorithm (divide and conquer) with some hand tweaking

– Takes advantage of shallow hierarchy

• Squarified treemap – Bruls, Huizing, van Wijk, EuroGraphics ‘00

– Alternate approach, similar results

– Small changes in data values can cause dramatic changes in layout

Treemap Variations • Strip treemap

– Use strips to place items

– Put new rectangle into strip if it makes average aspect ratio of all rectangles in strip go down, keep it there. Or if it makes aspect ratio go up, put it back and move to next strip

Compare results

Showing Structure • Regular borderless treemap makes it

challenging to discern structure of hierarchy, particularly large ones – Supplement Treemap view

– Change rectangles to other forms

Enclosure diagrams

Cushion Treemap • Add shading and texture to help convey

structure of hierarchy

Voronoi Treemaps • Use polygons instead of rectangles

Voronoi Treemaps

Radial Space-Filling • What if we used a radial rather than a

rectangular space-filling technique? – We saw node-link trees with root in center and

growing outward already...

• Make pie-tree with root in center and children growing outward – Radial angle now corresponds to a variables

rather than area

SunBurst • Root directory at center, each successive level

drawn farther out from center

• Sweep angle of item corresponds to size

• Color maps to file type or age

• Interactive controls for moving deeper in hierarchy, changing the root, etc.

• Double-click on directory makes it new root

SunBurst

• Node-link diagrams or space-filling techniques?

• It depends on the properties of the data – Node-link typically better at exposing

structure of information structure

– Space-filling good for focusing on one or two additional variables of cases

More Alternatives

Circle Packing

Icicle Tree • Similar to the node-link diagram

• The nodes are space-filling

Summary • Node-link diagrams or space-filling

techniques?

• It depends on the properties of the data – Node-link typically better at exposing

structure of information structure

– Space-filling good for focusing on one or two additional variables of cases

outline

• Visual Perception • High-dimensional Data Visualization • Hierarchical(tree) Data Visualization • Graphs and Networks Visualization • Time Series Data Visualization • Text and Document Visualization • Geographical Data Visualization

What is a Graph • Vertices (nodes) connected by Edges (links)

• Graph edges can be directed or undirected

• Graph edges can have values (weights)

Graph Uses • In information visualization, any number of

data sets can be modeled as a graph – Telephone system

– World Wide Web

– Distribution network for on-line retailer

– Call graph of a large software system

– Semantic map in an AI algorithm

– Set of connected friends

• Graph/network visualization is one of the oldest and most studied areas of InfoVis

Graph Visualization

www.nytimes.com/interactive/2008/05/05/science/20080506_DISEASE.html

Graph Visualization Challenges • Graph layout and positioning

– Make a concrete rendering of abstract graph

• Navigation/Interaction – How to support user changing focus and

moving around the graph

• Scale – Above two issues not too bad for small graphs,

but large ones are much tougher

Aesthetic Considerations • Crossings -- minimize towards planar

• Total Edge Length -- minimize towards proper scale

• Area -- minimize towards efficiency

• Maximum Edge Length -- minimize longest edge

• Uniform Edge Lengths -- minimize variances

• Total Bends -- minimize orthogonal towards straight-line

Layout Algorithms • Entire research community’s focus

• Common Layout Techniques – Hierarchical

– Force-directed

– Circular

– Geographic-based

– Clustered

– Attribute-based

– Matrix

Circular Layout • Ultra-simple

• May not look so great

• Space vertices out around circle

• Draw lines to connect vertices

Circular Layout

Arc Diagram Layout

Force-directed Layout • Example of constraint-based layout

technique

• Impose constraints (objectives) on layout – Shorten edges

– Minimize crossings

– …

• Define through equations

• Create optimization algorithm that attempts to best satisfy those equations

Force-directed Layout • Spring model (common)

– Edges : Springs (gravity attraction)

– Vertices : Charged particles (repulsion)

• Equations for forces

• Iteratively recalculate to update positions of vertices

• Seeking local minimum of energy – Sum of forces on each node is zero

Force-directed Layout

Variants • Spring layout

– Simple force-directed spring embedder

• Fruchterman-Reingold Algorithm – Add global temperature

– If hot, nodes move farther each step

– If cool, smaller movements

– Generally cools over time

• Kamada-Kawai algorithm – Examines derivatives of force

equations

– Brought to zero for minimum energy

Force-directed Layout • very flexible, aesthetic layouts on many

types of graphs

• can add custom forces

• relatively easy to implement

• repulsion loop is O(n2) per iteration – can speed up to O(NlogN) using quadtree or

k-d tree

• prone to local minima – can use simulated annealing

Node-link Layout • understandable visual mapping

• can show overall structure, clusters, paths

• flexible, many variations

• all but the most trivial

algorithms are > O(N2)

• not good for dense graphs – hairball problem!

Matrix Representations • There has been renewed interest in matrix

representations of graphs recently

• The regularity, symmetry, and structure of a matrix are a win – people understand them well

• But they don’t scale up really well

Adjacency Matrix • Great for

dense graphs

• Can spot clusters

Hybrid Layout • NodeTrix

– Hybrid of matrix and node-link

Really Big Graphs • May be difficult to keep all in memory

• Often visualized as “hairballs”

• Smart visualizations do structural clustering, so you see a high-level overview of topology

Hierarchical Edge Bundles • Bundle edges that go from/to similar

nodes together – Like wires in a house

• Uses B-spline curves for edges

• Reduces the clutter from many edges

Hierarchical Edge Bundles

Summary

• Graph Visualization need to consider

– layout

– simplification

– interaction

– Scale

• Facilitate understanding of complex socioeconomic patterns

outline

• Visual Perception • High-dimensional Data Visualization • Hierarchical(tree) Data Visualization • Graphs and Networks Visualization • Time Series Data Visualization • Text and Document Visualization • Geographical Data Visualization

Time Series Data • Fundamental chronological component to

the data set

• Each data case is likely an event of some kind

• One of the variables can be the date and time of the event

• Examples: – sunspot activity

– medicines taken

– cities visited

– stock prices

Time Series User Tasks • Examples

– When was something greatest/least?

– Is there a pattern?

– Are two series similar?

– Do any of the series match a pattern?

– Provide simpler, faster access to the series

Classification • Discrete points vs. interval points

• Linear time vs. cyclic time

• Ordinal time vs. continuous time

• Ordered time vs. branching time vs. time with multiple perspectives

Fundamental Tradeoff • Is the visualization time-dependent, ie,

changing over time (beyond just being interactive)? – Static

• Shows history, multiple perspectives, allows comparison

– Dynamic (animation) • Gives feel for process & changes over time, has

more space to work with

Standard Presentation • Present time data as a 2D line graph with

time on x-axis and some other variable on y-axis

Periodic Visualization • Visualizations can be very good at helping

us spot patterns in data

• Often, these patterns are periodic, most often repeating in time

Calendars

Dow Jones Industrial Average 2006 - 2009

Heat Maps

• each dot shows the time of each of the third of a million emails Stephen Wolfram have sent since 1989

Spirals • Standard x-y timeline or tabular display is

problematic for periodic data – It has endpoints

• Use spiral to help display data – One loop corresponds to one period

• Scale to large data sets

• Support identification of periodic structures in the data

• Compare multiple datasets

Spirals

• One year per loop

• Same month on radial bars

• Quantity represented by size of blob

ThemeRiver • Background: a user is less interested in

document themselves than in theme changes within the whole collection over time

• River height (thickness) encodes relative frequency of themes

• ThemeRiver provides users a macro-view of thematic changes

• Helps users identify time-related patterns, trends, and relationships across a large collection of documents

ThemeRiver

ThemeRiver

Summary • Think about the data

– What characteristics?

– Can InfoVis help?

• Think about the visualization techniques

outline

• Visual Perception • High-dimensional Data Visualization • Hierarchical(tree) Data Visualization • Graphs and Networks Visualization • Time Series Data Visualization • Text and Document Visualization • Geographical Data Visualization

Text is Everywhere • We use documents as primary information

artifact in our lives

• Our access to documents has grown tremendously in recent years due to networking infrastructure – WWW

– Digital libraries

Big Question • What can information visualization provide

to help users in understanding and gathering information from text and document collections?

• Related Topic - Information Retrieval

• InfoVis, seems to be most useful when – Perhaps not sure precisely what you’re

looking for

– More of a browsing task than a search one

Challenge • Text is nominal data

– Does not seem to map to geometric/graphical presentation as easily as ordinal and quantitative data

• The “Raw data --> Data Table” mapping now becomes more important

One Text Visualization

Uses: Layout Font Style Color

Word Counts

Tag/Word Clouds • Currently very “hot” in research

community

• Have proven to be very popular on web

• Idea is to show word/concept importance through visual means – Tags: User-specified metadata (descriptors)

about something

– Sometimes generalized to just reflect word frequencies

Tag/Word Clouds

Tag/Word Clouds

Problems • Actually not a great visualization

– Hard to find a particular word

– Long words get increased visual emphasis

– Font sizes are hard to compare

– Alphabetical ordering not ideal for many tasks

• Why So Popular? – Serve as social signifiers that provide a

friendly atmosphere that provide a point of entry into a complex site

– Act as individual and group mirrors

– Fun, not business-like

Wordle

www.wordle.net

Wordle • Tightly packed words, sometimes vertical

or diagonal

• Word size is linearly correlated with frequency (typically square root in cloud)

• Multiple color palettes

• User gets some control

Layout Algorithm • Idea:

– sort words by weight, decreasing order for each word w

w.position := makeInitialPosition(w);

while w intersects other words:

updatePosition(w);

– Init position randomly chosen according to distribution for target shape

– Update position moves out radically

Beyond Individual Words • Can we show combinations of words,

phrases, and sentences?

Word Tree

Word Tree • Shows context of a word or words

– Follow word with all the phrases that follow it

• Font size shows frequency of appearance

• Continue branch until hitting unique phrase

• Clicking on phrase makes it the focus

• Ordered alphabetically, by frequency, or by first appearance

Interaction

Phrase Nets • Examine unstructured text documents

• Presents pairs of terms from phrases such as – X and Y

– X’s Y

– X at Y

– X (is | are | was | were) Y

• Uses special graph layout algorithm with compression and simplification

Examples

Overviews of Documents • Can we provide a quick browsing,

overview UI, maybe especially useful for small screens?

Document Cards • Compact visual representation of a

document

• Show key terms and important images

Document Cards • Layout algorithm searches for empty

spacerectangles to put things

Interaction • Hover over non-image space shows abstract in

tooltip

• Hover over image and see caption as tooltip

• Click on page number to get full page

• Click on image goes to page containing it

• Clicking on a term highlights it in overview and all tooltips

Text Themes • Look for sets of regions in a document (or

sets of documents) that all have common theme – Closely related to each other, but different

from rest

• Need to run clustering process

Themescapes • Self-organizing maps didn’t reflect density

of regions all that well -- Can we improve?

• Use 3D representation, and have height represent density or number of documents in region

Themescapes

WebTheme

Topic Modeling • Hot topic in text analysis and visualization

• Latent Dirichlet Allocation

• Unsupervised learning

• Produces “topics” evident throughout doc collection, each modeled by sets of words/terms

• Describes how each document contributes to each topic

TIARA • Keeps basic ThemeRiver metaphor

• Embed word clouds into bands to tell more about what is in each

• Magnifier lens for getting more details

• Uses Latent Dirichlet Allocation to do text analysis and summarization

Representation

Features

TextFlow • Showing how topics merge and split

ParallelTopics

outline

• Visual Perception • High-dimensional Data Visualization • Hierarchical(tree) Data Visualization • Graphs and Networks Visualization • Time Series Data Visualization • Text and Document Visualization • Geographical Data Visualization

Geographical Data • Fundamentally different from other kinds

of data since they are inherently spatially structured in two or three dimensions

Dot Map • Visualizing specific points

• Nominal data

• Arranged by Latitude/Longitude

Dot Map

sanfrancisco.crimespotting.org

Dot Map • Clustering, e.g. k-means algorithm

Choropleth Map • Areas are shaded or patterned in

proportion to the measurement of the statistical variable being displayed on the map

Choropleth

Problems • Easy to slant data to suit the

cartographer’s purpose (by adjusting the slicing values)

• Create the illusion of rapid breaks whereas data varies continuously and gradually in the real world

• Allow small areas (like major cities) to overwhelm the data of large regions (like states)

Cartogram • A cartogram is a diagram which uses the

form of a map to present numeric information while maintaining some degree of geographic accuracy.

Cartogram

Flow map • a mix of maps and flow charts, that show

the movement of objects from one location to another

Flow Bundle