Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
© Munzner/Möller
High-Dimensional Data
VisualizationTorsten Möller
© Munzner/Möller 2
Readings
• Munzner, “Visualization Design and Analysis”:– Chapter 8 (Reducing Items and Attributes)
© Munzner/Möller
Data Reduction
• Cannot make sense of everything at once
• reduce amount shown– items (rows of a table / elements)– attributes (columns of a table / dimensions)
3
© Munzner/Möller
Last time - Item reduction
• simple filtering• simple aggregation• navigation (camera-oriented: zooming)
– geometric vs. semantic zooming– pan/translate– constraints + combinations
• overviews: focus + context– selective filtering– geometric distortions
4
© Munzner/Möller
Today - Attribute reduction
• simple attribute filtering• camera-oriented:
– slice– cut– project
• aggregation: dimensionality reduction– linear– non-linear
5
© Munzner/Möller
Attribute filtering: SPloM• filtering, but for attributes rather than items
– unfiltered vs filtered SPLOM•
[Yang et al. InfoVis 2003]
© Munzner/Möller
Attribute filtering: Parallel Coordinates
• unfiltered vs. filtered PC
[Yang et al. InfoVis 2003]
© Munzner/Möller
Attribute filtering: Star Glyphs
• unfiltered vs. filtered
[Yang et al. InfoVis 2003]
© Munzner/Möller
Today - Attribute reduction
• simple attribute filtering• camera-oriented:
– slice– cut– project
• aggregation: dimensionality reduction– linear– non-linear
9
© Munzner/Möller
Slicing / Cutting: Spatial Data
• easy to understand metaphor• reduces data
from 3D to 2D
10
© Munzner/Möller
Axis-aligned slices
11http://rsbweb.nih.gov/ij/plugins/volume-viewer.html
© Munzner/Möller
HyperSlice: slicing in multi-d
• HyperSlice: matrix of orthogonal 2D slices– each panel is display and control: drag to
change slice– simple 3D example
12
© Munzner/Möller
Example
• 4D function Sum(i=0...3) wi/(1 + |x − pi|2)• diagonals = standard graph
13
© Munzner/Möller
(orthographic) Projections
• (axis-aligned) orthographic: remove all information about filtered dims– hypercube: 3D to 2D, 4D to 3D (video)
14
© Munzner/Möller
SPloM: orthographic projections in multi-d
15
© Munzner/Möller 16
Oblique (parallel) projections
• Projectors are not normal to projection plane
• Most drawings in textbooks use oblique projection
© Munzner/Möller 17
Common oblique projections
• Cavalier projections– Angle α = 45 degrees– Preserves the length of a line segment
perpendicular to the projection plane– Angle φ is typically 30 or 45 degrees
• Cabinet projections– Angle α = 63.7 degrees or arctan(2)– Halves the length of a line segment
perpendicular to the projection plane – more realistic than cavalier
© Munzner/Möller
(perspective) Projections
• some info about filtered dims remains• mimics human visual system in 3D• not as effective in multi-D
18
© Munzner/Möller
Today - Attribute reduction
• simple attribute filtering• camera-oriented:
– slice– cut– project
• aggregation: dimensionality reduction– linear– non-linear
19
© Munzner/Möller
Dimensionality vs. attribute reduction
• vocab use in field not consistent– dimension/attribute
• attribute reduction: reduce set with filtering– includes orthographic projection
• dimensionality reduction: create smaller set of new dims– set size is smaller than original, new dims
completely synthetic– clarification: includes dimensional aggregation– includes some projections (but not all)
• vocab: projection/mapping20
© Munzner/Möller
Uncovering Hidden Structure
• measurements indirect not direct– real-world sensor limitations
• measurements made in sprawling space– documents, images
• DR only suitable if (almost) all information could be conveyed with fewer dimensions– how do you know? need to estimate true
dimensionality to check if different than original! 21
© Munzner/Möller
Dimensionality Reduction• mapping multidimensional space into
space of fewer dimensions– typically 2D for clarity– 1D/3D possible?– keep/explain as much variance as possible– show underlying dataset structure
• motivation: true/intrinsic dimension of dataset is much lower than measured dim
• linear vs. non-linear approaches22
© Munzner/Möller
Estimating True Dimensionality
• error for low-dim projection vs high-dim original
• no single correct answer; many metrics proposed– cumulative variance that is not accounted for– strain: match variations in distance (vs actual
distance values)– stress: difference between interpoint
distances in high and low dimensions
23
© Munzner/Möller
Showing Dimensionality Estimates
• scree plots as simple way: error against # dims– original dataset: 294 dims– estimate: almost all variance preserved with
< 20 dims
24
© Munzner/Möller
Linear DimRed• based on linear projections!• given dimensions have a strong meaning• want to preserve linearity in layout• examples: Principal component analysis (PCA),
Independent component analysis (ICA), Linear discriminant analysis (LDA), etc.
25
PCA LDA
© Munzner/Möller
Problem for linear dimred
26
© Munzner/Möller
Nonlinear DimRed
• no inherent meaning to given dimensions• minimize differences between interpoint
distances in high and low dimensions• examples: Multidimensional scaling
(MDS), isomap, local linear embedding (LLE)
27
© Munzner/Möller
Dimred: Isomap
• 4096 D to 2D [Tenenbaum 00]• 2D: wrist rotation, fingers extension
[A G
loba
l Geo
met
ric F
ram
ewor
k fo
r Non
linea
r Dim
ensi
onal
ity R
educ
tion.
Tene
nbau
m, d
e S
ilva
and
Lang
ford
. S
cien
ce 2
90 (5
500)
: 231
9-23
23, 2
2 D
ecem
ber 2
000,
is
omap
.sta
nfor
d.ed
u]
28
© Munzner/Möller
Goals / Tasks
• keep/explain as much variance as possible
• find clusters– or compare / evaluate with previous
clustering• understand structure
– absolute position not reliable– fine grained structure not reliable
29
© Munzner/Möller
Naive Spring Model• repeat for all points
– compute spring force to all other points• difference between high dim vs. low dim distance (stress)
– move to better location using computed forces• compute distances between all points
– O(n2) iteration, O(n3) algorithm
30
© Munzner/Möller
Faster Spring Model [Chalmers 96]
• compare distances only with a few points– maintain small local neighborhood set
31
© Munzner/Möller
• compare distances only with a few points– maintain small local neighborhood set– each time pick some randoms, swap in if
closer
Faster Spring Model [Chalmers 96]
32
© Munzner/Möller
• compare distances only with a few points– maintain small local neighborhood set– each time pick some randoms, swap in if
closer
Faster Spring Model [Chalmers 96]
33
© Munzner/Möller
• compare distances only with a few points– maintain small local neighborhood set– each time pick some randoms, swap in if closer
• small constant: 6 locals, 3 randoms typical– O(n) iteration, O(n2) algorithm
Faster Spring Model [Chalmers 96]
34
© Munzner/Möller
True/Intrinsic Dimensionality
• how many dimensions is enough?– could be more than 2 or 3– knee in scree plot
• example– measured materials
from graphics– linear PCA: 25– get physically
impossible intermediate points
35
A Data-Driven Reflectance Model, SIGGRAPH 2003,W. Matusik, H. Pfister, M. Brand and L. McMillan, people.csail.mit.edu/wojciech/DDRM/ddrm.pdf
© Munzner/Möller
True/Intrinsic Dimensionality
• nonlinear MDS: 10-15– all intermediate points possible
• they have intuitive meaning– red, green, blue,
specular, diffuse, glossy, metallic, plastic-y, roughness, rubbery, greasiness, dustiness...
36
A Data-Driven Reflectance Model, SIGGRAPH 2003,W. Matusik, H. Pfister, M. Brand and L. McMillan, people.csail.mit.edu/wojciech/DDRM/ddrm.pdf
© Munzner/Möller
Showing DR Data• scatterplot showing points
– only works if true dimensionality is 2 (... or 3)– need to drill down to see what points represent
• SPLOM– safe choice
• landscapes– avoid! studies show worse than just using points
37