Upload
nguyenphuc
View
214
Download
0
Embed Size (px)
Citation preview
Information Visualization
outline
• Visual Perception • High-dimensional Data Visualization • Hierarchical(tree) Data Visualization • Graphs and Networks Visualization • Time Series Data Visualization • Text and Document Visualization • Geographical Data Visualization
outline
• Visual Perception • High-dimensional Data Visualization • Hierarchical(tree) Data Visualization • Graphs and Networks Visualization • Time Series Data Visualization • Text and Document Visualization • Geographical Data Visualization
Visual Perception • Seek to better understand visual
perception and visual information processing – Multiple theories or models exist
– Need to understand physiology and cognitive psychology
One (simple) Model • Two stage process
– Parallel extraction of low-level properties of scene
– Sequential goal-directed processing
Stage 1 - Low-level, Parallel • Neurons in eye & brain responsible for different
kinds of information – Orientation, color, texture, movement, etc.
• Arrays of neurons work in parallel
• Occurs “automatically”
• Rapid
• Information is transitory, briefly held in iconic store
• Bottom-up data-driven model of processing
• Often called “pre-attentive” processing
Stage 2 - Sequential, Goal-Directed
• Splits into subsystems for object recognition and for interacting with environment
• Increasing evidence supports independence of systems for symbolic object manipulation and for locomotion & action
• First subsystem then interfaces to verbal linguistic portion of brain, second interfaces to motor systems that control muscle movements
Stage 2 Attributes • Slow serial processing
• Involves working and long-term memory
• More emphasis on arbitrary aspects of symbols
• Top-down processing
Preattentive Processing • How does human visual system analyze
images? – Some things seem to be done preattentively,
without the need for focused attention
– Generally less than 200-250 ms (eye movements take 200 ms)
– Seems to be done in parallel by low-level vision system
How Many 3’s? 1281768756138976546984506985604982826762
9809858458224509856458945098450980943585
9091030209905959595772564675050678904567
8845789809821677654876364908560912949686
How Many 3’s? 1281768756138976546984506985604982826762
9809858458224509856458945098450980943585
9091030209905959595772564675050678904567
8845789809821677654876364908560912949686
What Kinds of Tasks? • Target detection
– Is something there?
• Boundary detection – Can the elements be grouped?
• Counting – How many elements of a certain type are
present?
Example
• Determine if a blue circle is present
Hue
• Can be done rapidly (preattentively) by people
• Surrounding objects called “distractors”
Shape
• Can be done preattentively by people
Hue and Shape
• Cannot be done preattentively
• Must perform a sequential search
• Conjuction of features (shape and hue) causes it
Example
• Is there a boundary in the display?
Hue versus Shape
• Left: Boundary detected preattentively based on hue regardless of shape
• Right: Cannot do mixed color shapes preattentively
Gestalt Laws • Background
– German psychologists, early 1900’s
– Attempt to understand pattern perception
– Founded Gestalt school of psychology
– Provided clear descriptions of many basic perceptual phenomena • Gestalt Laws of Pattern Perception
Gestalt Laws • Proximity
– Things close together are perceptually grouped together
• Similarity – Similar elements get grouped
together
Gestalt Laws • Figure & Ground
– Figure is foreground, ground is behind
• Continuity – More likely to construct
visual entities out of smooth, continuous visual elements
Gestalt Laws • Symmetry
– Symmetrical patterns are perceived more as a whole
• Closure – A closed contour is seen as an
object
Visual Encoding • Marks: geometric primitives
– Points
– Lines
– Areas
• Visual channels: control appearance of marks – Position
– Color
– Size
– …
Variables of the Image
Visual Channel • Visual Channel Types and Rankings
Visual Channel Types and Rankings
containment
similarity proximity
connection
relational grouping
position
hue
pattern
shape
categorical
what/where
position
length
angle
area
lightness/saturation
Stipple density
ordered
how much
more accurate
Visual Channel Types and Rankings
outline
• Visual Perception • High-dimensional Data Visualization • Hierarchical(tree) Data Visualization • Graphs and Networks Visualization • Time Series Data Visualization • Text and Document Visualization • Geographical Data Visualization
How Many Variables? • Data sets of dimensions 1, 2, 3 are
common
• Number of variables per class – 1 - Univariate data
– 2 - Bivariate data
– 3 - Trivariate data
– >3 - Hypervariate data
Representations • Some standard ways for low-dimensional
data
More Dimensions • Fundamentally, we have 2 geometric
(position) display dimensions
• For data sets with >2 variables, we must project data down to 2D
• Come up with visual mapping that locates each dimension into 2D plane
• Computer graphics: 3D->2D projections
Scatterplot Matrix • Represent each
possible pair of variables in their own 2-D scatterplot
• Generalized Sensitivity Scatterplot
Scatterplot
Parallel Coordinates • Encode variables along
a horizontal row
• Vertical line specifies different values that variable can take
• Data point represented as a polyline
Parallel Coords Example
Issue • Different variables can have values taking
on quite different ranges
• Must normalize all down (e.g., 0->1)
Challenges • Too much data
Dimensional Reordering • Which dimensions are most like each
other?
Same dimensions ordered according to similarity
Reducing Density
Star Plots • Space out the n variables
at equal angles around a circle
• Each “spoke” encodes a variable’s value
• Data point is now a “shape”
Star Plots
Star Coordinates
Dust & Magnet • Altogether different metaphor
• Data cases represented as small bits of iron dust
• Different attributes given physical manifestation as magnets
• Interact with objects to explore data
Interface
Interaction • Iron bits (data) are drawn toward magnets
(attributes) proportional to that data element’s value in that attribute – Higher values attracted more strongly
• All magnets present on display affect position of all dust
• Individual power of magnets can be changed
• Dust’s color and size can connected to attributes as well
Interaction • Moving a magnet makes all the dust move
– Also command for shaking dust
• Different strategies for how to position magnets in order to explore the data
Dust & Magnet
Dense Pixel Display • Represent data case or a variable as a pixel
• Million or more per display
• Rely on use of color
• Value ranges are mapped to a fixed color sequence of full color (hue)scale but monotonically decreasing brightness
• Data values belonging to one attribute are displayed in a separate view ‐ only one pixel per data value without need for a border
One Representation • Grouping arrangement
• One pixel per variable
• Each data case has its own small rectangular icon
• Plot out variables for data point in that icon using a grid or spiral layout
Illustration
Pixel Bar Chart • Make each pixel within a bar correspond
to a data point in that group represented by the bar
• Can do millions that way
• Color the pixel to represent the value of one of the data point’s variables
Pixel Bar Chart
Product type is x-axis divider
Customers ordered by
y-axis: dollar amount
x-axis: number of visits
Color is (a) dollar amount spent, (b) number of visits, (c) sales quantity
Stacked Graph
Small Multiples
Summary • We’ve seen many general techniques for
multivariate – Scatterplot matrix
– Parallel coordinates
– Dense Pixel Display
– Stacked Graph
– …
• Know strengths and limitations of each
• Know which ones are good for which circumstances
outline
• Visual Perception • High-dimensional Data Visualization • Hierarchical(tree) Data Visualization • Graphs and Networks Visualization • Time Series Data Visualization • Text and Document Visualization • Geographical Data Visualization
Hierarchies • Definition
– Data repository in which cases are related to subcases
– Can be thought of as imposing an ordering in which cases are parents or ancestors of other cases
• Pervasive in the world – Family histories, ancestries
– File/directory systems on computers
– Organization charts
– Object-oriented software classes
Trees • Hierarchies often represented as trees
– Directed, acyclic graph
• Two main representation schemes – Node-link
– Space-filling
Spatial Layout • Primary concern of graph drawing is the
spatial layout of nodes and edges
• Often (but not always) the goal is to effectively depict the graph structure – connectivity, path-following
– network distance
– clustering
– ordering (e.g., hierarchy level)
Indentation • place all items along
vertically spaced rows
• indentation used to show parent/child relationships
• commonly used as a component in an interface
• breadth and depth contend for space
• often requires a great deal of scrolling
Node-Link Diagrams • nodes are distributed in space, connected
by straight or curved lines
• typical approach is to use 2D space to break apart breadth and depth
• often space is used to communicate hierarchical orientation
Node-Link Diagrams • Root at top, leaves at bottom is very common
Node-Link Diagrams • Root can be at center with levels growing
outward too
Basic Algorithm – Recursive algorithm
– Height on separate levels
– Width in unique columns
– Make room for subtrees upwards
Reingold-Tilford Algorithm • goal
– make smarter use of space
– maximize density and symmetry
• design concerns – clearly encode depth level
– no edge crossings
– isomorphic subtrees drawn identically
– compact
• approach – bottom up recursive approach
– for each parent make sure every subtree is drawn
– pack subtrees as closely as possible
– center parent over subtrees
Radial Layout • node-link diagram in
polar coordinates
• radius encodes depth with root in center
• angular sectors assigned to subtrees
• Reingold-Tilford can be applied
Potential Problems • For top-down, width of fan-out uses up
horizontal real estate very quickly – At level n, there are 2n nodes
• Tree might grow a lot along one particular branch – Hard to draw it well in view without knowing
how it will branch
InfoVis Solutions • Techniques developed in Information
Visualization largely try to assist the problems
• Alternatively, Information Visualization techniques attempt to show more attributes of data cases in hierarchy or focus on particular applications of trees
SpaceTree • Uses conventional 2D layout techniques with
some clever additions
Grosjean, Plaisant, Bederson InfoVis ‘02
Characteristics • Vertical or horizontal
• Subtrees are triangles – Size indicates depth
– Shading indicates number of nodes inside
• Navigate by clicking on nodes – Strongly restrict zooming
SpaceTree
3D Approaches • Add a third dimension into which layout can
go
• Compromise of top-down and centered techniques mentioned earlier
• Children of a node are laid out in a cylinder “below” the parent – Siblings live in one of the 2D planes
Cone Trees • Pros
– More effective area to lay out tree
– Use of smooth animation to help person track updates
– Aesthetically pleasing
• Cons – As in all 3D, occlusion
obscures some nodes
– Non-trivial to implement and requires some graphics horsepower
Alternative Solutions • Change the geometry
• Apply a hyperbolic transformation to the space
• Root is at center, subordinates around
• Apply idea recursively, distance decreases between parent and child as you move farther from center, children go in wedge rather than circle
Hyperbolic Browser • Focus + Context Technique
– Detailed view blended with a global view
• First lay out the hierarchy on the hyperbolic plane
• Then map this plane to a disk
• Start with the tree’s root at the center
• Use animation to navigate along this representation of the plane
2D Hyperbolic Browser • Approach: Lay out the hierarchy on the
hyperbolic plane and map this plane onto a display region.
• Comparison – A standard 2D browser: 100 nodes (w/3 character
text strings)
– Hyperbolic browser: 1000 nodes, about 50 nearest the focus can show from 3 to dozens of characters
Hyperbolic Browser
Key Attributes • Natural magnification (fisheye) in center
• Layout depends only on 2-3 generations from current node
• Smooth animation for change in focus
• Don’t draw objects when far enough from root (simplify rendering)
Problems • Orientation
– Watching the view can be disorienting
– When a node is moved, its children don’t keep their relative orientation to it as in Euclidean plane, they rotate
– Not as symmetric and regular as Euclidean techniques, two important attributes in aesthetics
Space-Filling • Each item occupies an area
• Children are “contained” under parent
Treemap • Space-filling representation developed by
Shneiderman and Johnson, Vis ‘91
• Children are drawn inside their parent
• Alternate horizontal and vertical slicing at each successive level
• Use area to encode other variable of data items
Treemap • Example
Applications • Can use Treemap idea for a variety of domains
– File/directory structures
– Basketball statistics
– Software diagrams
– …
Treemap Affordances • Good representation of two attributes beyond
node-link: color and area
• Not as good at representing structure – What happens if it’s a perfectly balanced tree of
items all the same size?
– Also can get long-thin aspect ratios
– Borders help on smaller trees, but take up too much area on large, deep ones
Treemap Variations • Cluster Treemap
– Compromises treemap algorithm to avoid bad aspect ratios
– Basic algorithm (divide and conquer) with some hand tweaking
– Takes advantage of shallow hierarchy
• Squarified treemap – Bruls, Huizing, van Wijk, EuroGraphics ‘00
– Alternate approach, similar results
– Small changes in data values can cause dramatic changes in layout
Treemap Variations • Strip treemap
– Use strips to place items
– Put new rectangle into strip if it makes average aspect ratio of all rectangles in strip go down, keep it there. Or if it makes aspect ratio go up, put it back and move to next strip
Compare results
Showing Structure • Regular borderless treemap makes it
challenging to discern structure of hierarchy, particularly large ones – Supplement Treemap view
– Change rectangles to other forms
Enclosure diagrams
Cushion Treemap • Add shading and texture to help convey
structure of hierarchy
Voronoi Treemaps • Use polygons instead of rectangles
Voronoi Treemaps
Radial Space-Filling • What if we used a radial rather than a
rectangular space-filling technique? – We saw node-link trees with root in center and
growing outward already...
• Make pie-tree with root in center and children growing outward – Radial angle now corresponds to a variables
rather than area
SunBurst • Root directory at center, each successive level
drawn farther out from center
• Sweep angle of item corresponds to size
• Color maps to file type or age
• Interactive controls for moving deeper in hierarchy, changing the root, etc.
• Double-click on directory makes it new root
SunBurst
• Node-link diagrams or space-filling techniques?
• It depends on the properties of the data – Node-link typically better at exposing
structure of information structure
– Space-filling good for focusing on one or two additional variables of cases
More Alternatives
Circle Packing
Icicle Tree • Similar to the node-link diagram
• The nodes are space-filling
Summary • Node-link diagrams or space-filling
techniques?
• It depends on the properties of the data – Node-link typically better at exposing
structure of information structure
– Space-filling good for focusing on one or two additional variables of cases
outline
• Visual Perception • High-dimensional Data Visualization • Hierarchical(tree) Data Visualization • Graphs and Networks Visualization • Time Series Data Visualization • Text and Document Visualization • Geographical Data Visualization
What is a Graph • Vertices (nodes) connected by Edges (links)
• Graph edges can be directed or undirected
• Graph edges can have values (weights)
Graph Uses • In information visualization, any number of
data sets can be modeled as a graph – Telephone system
– World Wide Web
– Distribution network for on-line retailer
– Call graph of a large software system
– Semantic map in an AI algorithm
– Set of connected friends
• Graph/network visualization is one of the oldest and most studied areas of InfoVis
Graph Visualization
www.nytimes.com/interactive/2008/05/05/science/20080506_DISEASE.html
Graph Visualization Challenges • Graph layout and positioning
– Make a concrete rendering of abstract graph
• Navigation/Interaction – How to support user changing focus and
moving around the graph
• Scale – Above two issues not too bad for small graphs,
but large ones are much tougher
Aesthetic Considerations • Crossings -- minimize towards planar
• Total Edge Length -- minimize towards proper scale
• Area -- minimize towards efficiency
• Maximum Edge Length -- minimize longest edge
• Uniform Edge Lengths -- minimize variances
• Total Bends -- minimize orthogonal towards straight-line
Layout Algorithms • Entire research community’s focus
• Common Layout Techniques – Hierarchical
– Force-directed
– Circular
– Geographic-based
– Clustered
– Attribute-based
– Matrix
Circular Layout • Ultra-simple
• May not look so great
• Space vertices out around circle
• Draw lines to connect vertices
Circular Layout
Arc Diagram Layout
Force-directed Layout • Example of constraint-based layout
technique
• Impose constraints (objectives) on layout – Shorten edges
– Minimize crossings
– …
• Define through equations
• Create optimization algorithm that attempts to best satisfy those equations
Force-directed Layout • Spring model (common)
– Edges : Springs (gravity attraction)
– Vertices : Charged particles (repulsion)
• Equations for forces
• Iteratively recalculate to update positions of vertices
• Seeking local minimum of energy – Sum of forces on each node is zero
Force-directed Layout
Variants • Spring layout
– Simple force-directed spring embedder
• Fruchterman-Reingold Algorithm – Add global temperature
– If hot, nodes move farther each step
– If cool, smaller movements
– Generally cools over time
• Kamada-Kawai algorithm – Examines derivatives of force
equations
– Brought to zero for minimum energy
Force-directed Layout • very flexible, aesthetic layouts on many
types of graphs
• can add custom forces
• relatively easy to implement
• repulsion loop is O(n2) per iteration – can speed up to O(NlogN) using quadtree or
k-d tree
• prone to local minima – can use simulated annealing
Node-link Layout • understandable visual mapping
• can show overall structure, clusters, paths
• flexible, many variations
• all but the most trivial
algorithms are > O(N2)
• not good for dense graphs – hairball problem!
Matrix Representations • There has been renewed interest in matrix
representations of graphs recently
• The regularity, symmetry, and structure of a matrix are a win – people understand them well
• But they don’t scale up really well
Adjacency Matrix • Great for
dense graphs
• Can spot clusters
Hybrid Layout • NodeTrix
– Hybrid of matrix and node-link
Really Big Graphs • May be difficult to keep all in memory
• Often visualized as “hairballs”
• Smart visualizations do structural clustering, so you see a high-level overview of topology
Hierarchical Edge Bundles • Bundle edges that go from/to similar
nodes together – Like wires in a house
• Uses B-spline curves for edges
• Reduces the clutter from many edges
Hierarchical Edge Bundles
Summary
• Graph Visualization need to consider
– layout
– simplification
– interaction
– Scale
• Facilitate understanding of complex socioeconomic patterns
outline
• Visual Perception • High-dimensional Data Visualization • Hierarchical(tree) Data Visualization • Graphs and Networks Visualization • Time Series Data Visualization • Text and Document Visualization • Geographical Data Visualization
Time Series Data • Fundamental chronological component to
the data set
• Each data case is likely an event of some kind
• One of the variables can be the date and time of the event
• Examples: – sunspot activity
– medicines taken
– cities visited
– stock prices
Time Series User Tasks • Examples
– When was something greatest/least?
– Is there a pattern?
– Are two series similar?
– Do any of the series match a pattern?
– Provide simpler, faster access to the series
Classification • Discrete points vs. interval points
• Linear time vs. cyclic time
• Ordinal time vs. continuous time
• Ordered time vs. branching time vs. time with multiple perspectives
Fundamental Tradeoff • Is the visualization time-dependent, ie,
changing over time (beyond just being interactive)? – Static
• Shows history, multiple perspectives, allows comparison
– Dynamic (animation) • Gives feel for process & changes over time, has
more space to work with
Standard Presentation • Present time data as a 2D line graph with
time on x-axis and some other variable on y-axis
Periodic Visualization • Visualizations can be very good at helping
us spot patterns in data
• Often, these patterns are periodic, most often repeating in time
Calendars
Dow Jones Industrial Average 2006 - 2009
Heat Maps
• each dot shows the time of each of the third of a million emails Stephen Wolfram have sent since 1989
Spirals • Standard x-y timeline or tabular display is
problematic for periodic data – It has endpoints
• Use spiral to help display data – One loop corresponds to one period
• Scale to large data sets
• Support identification of periodic structures in the data
• Compare multiple datasets
Spirals
• One year per loop
• Same month on radial bars
• Quantity represented by size of blob
ThemeRiver • Background: a user is less interested in
document themselves than in theme changes within the whole collection over time
• River height (thickness) encodes relative frequency of themes
• ThemeRiver provides users a macro-view of thematic changes
• Helps users identify time-related patterns, trends, and relationships across a large collection of documents
ThemeRiver
ThemeRiver
Summary • Think about the data
– What characteristics?
– Can InfoVis help?
• Think about the visualization techniques
outline
• Visual Perception • High-dimensional Data Visualization • Hierarchical(tree) Data Visualization • Graphs and Networks Visualization • Time Series Data Visualization • Text and Document Visualization • Geographical Data Visualization
Text is Everywhere • We use documents as primary information
artifact in our lives
• Our access to documents has grown tremendously in recent years due to networking infrastructure – WWW
– Digital libraries
Big Question • What can information visualization provide
to help users in understanding and gathering information from text and document collections?
• Related Topic - Information Retrieval
• InfoVis, seems to be most useful when – Perhaps not sure precisely what you’re
looking for
– More of a browsing task than a search one
Challenge • Text is nominal data
– Does not seem to map to geometric/graphical presentation as easily as ordinal and quantitative data
• The “Raw data --> Data Table” mapping now becomes more important
One Text Visualization
Uses: Layout Font Style Color
Word Counts
Tag/Word Clouds • Currently very “hot” in research
community
• Have proven to be very popular on web
• Idea is to show word/concept importance through visual means – Tags: User-specified metadata (descriptors)
about something
– Sometimes generalized to just reflect word frequencies
Tag/Word Clouds
Tag/Word Clouds
Problems • Actually not a great visualization
– Hard to find a particular word
– Long words get increased visual emphasis
– Font sizes are hard to compare
– Alphabetical ordering not ideal for many tasks
• Why So Popular? – Serve as social signifiers that provide a
friendly atmosphere that provide a point of entry into a complex site
– Act as individual and group mirrors
– Fun, not business-like
Wordle
www.wordle.net
Wordle • Tightly packed words, sometimes vertical
or diagonal
• Word size is linearly correlated with frequency (typically square root in cloud)
• Multiple color palettes
• User gets some control
Layout Algorithm • Idea:
– sort words by weight, decreasing order for each word w
w.position := makeInitialPosition(w);
while w intersects other words:
updatePosition(w);
– Init position randomly chosen according to distribution for target shape
– Update position moves out radically
Beyond Individual Words • Can we show combinations of words,
phrases, and sentences?
Word Tree
Word Tree • Shows context of a word or words
– Follow word with all the phrases that follow it
• Font size shows frequency of appearance
• Continue branch until hitting unique phrase
• Clicking on phrase makes it the focus
• Ordered alphabetically, by frequency, or by first appearance
Interaction
Phrase Nets • Examine unstructured text documents
• Presents pairs of terms from phrases such as – X and Y
– X’s Y
– X at Y
– X (is | are | was | were) Y
• Uses special graph layout algorithm with compression and simplification
Examples
Overviews of Documents • Can we provide a quick browsing,
overview UI, maybe especially useful for small screens?
Document Cards • Compact visual representation of a
document
• Show key terms and important images
Document Cards • Layout algorithm searches for empty
spacerectangles to put things
Interaction • Hover over non-image space shows abstract in
tooltip
• Hover over image and see caption as tooltip
• Click on page number to get full page
• Click on image goes to page containing it
• Clicking on a term highlights it in overview and all tooltips
Text Themes • Look for sets of regions in a document (or
sets of documents) that all have common theme – Closely related to each other, but different
from rest
• Need to run clustering process
Themescapes • Self-organizing maps didn’t reflect density
of regions all that well -- Can we improve?
• Use 3D representation, and have height represent density or number of documents in region
Themescapes
WebTheme
Topic Modeling • Hot topic in text analysis and visualization
• Latent Dirichlet Allocation
• Unsupervised learning
• Produces “topics” evident throughout doc collection, each modeled by sets of words/terms
• Describes how each document contributes to each topic
TIARA • Keeps basic ThemeRiver metaphor
• Embed word clouds into bands to tell more about what is in each
• Magnifier lens for getting more details
• Uses Latent Dirichlet Allocation to do text analysis and summarization
Representation
Features
TextFlow • Showing how topics merge and split
ParallelTopics
outline
• Visual Perception • High-dimensional Data Visualization • Hierarchical(tree) Data Visualization • Graphs and Networks Visualization • Time Series Data Visualization • Text and Document Visualization • Geographical Data Visualization
Geographical Data • Fundamentally different from other kinds
of data since they are inherently spatially structured in two or three dimensions
Dot Map • Visualizing specific points
• Nominal data
• Arranged by Latitude/Longitude
Dot Map
sanfrancisco.crimespotting.org
Dot Map • Clustering, e.g. k-means algorithm
Choropleth Map • Areas are shaded or patterned in
proportion to the measurement of the statistical variable being displayed on the map
Choropleth
Problems • Easy to slant data to suit the
cartographer’s purpose (by adjusting the slicing values)
• Create the illusion of rapid breaks whereas data varies continuously and gradually in the real world
• Allow small areas (like major cities) to overwhelm the data of large regions (like states)
Cartogram • A cartogram is a diagram which uses the
form of a map to present numeric information while maintaining some degree of geographic accuracy.
Cartogram
Flow map • a mix of maps and flow charts, that show
the movement of objects from one location to another
Flow Bundle