37
© Munzner/Möller High-Dimensional Data Visualization Torsten Möller

High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted

© Munzner/Möller

High-Dimensional Data

VisualizationTorsten Möller

Page 2: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted

© Munzner/Möller 2

Readings

• Munzner, “Visualization Design and Analysis”:– Chapter 8 (Reducing Items and Attributes)

Page 3: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted

© Munzner/Möller

Data Reduction

• Cannot make sense of everything at once

• reduce amount shown– items (rows of a table / elements)– attributes (columns of a table / dimensions)

3

Page 4: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted

© Munzner/Möller

Last time - Item reduction

• simple filtering• simple aggregation• navigation (camera-oriented: zooming)

– geometric vs. semantic zooming– pan/translate– constraints + combinations

• overviews: focus + context– selective filtering– geometric distortions

4

Page 5: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted

© Munzner/Möller

Today - Attribute reduction

• simple attribute filtering• camera-oriented:

– slice– cut– project

• aggregation: dimensionality reduction– linear– non-linear

5

Page 6: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted

© Munzner/Möller

Attribute filtering: SPloM• filtering, but for attributes rather than items

– unfiltered vs filtered SPLOM•

[Yang et al. InfoVis 2003]

Page 7: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted

© Munzner/Möller

Attribute filtering: Parallel Coordinates

• unfiltered vs. filtered PC

[Yang et al. InfoVis 2003]

Page 8: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted

© Munzner/Möller

Attribute filtering: Star Glyphs

• unfiltered vs. filtered

[Yang et al. InfoVis 2003]

Page 9: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted

© Munzner/Möller

Today - Attribute reduction

• simple attribute filtering• camera-oriented:

– slice– cut– project

• aggregation: dimensionality reduction– linear– non-linear

9

Page 10: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted

© Munzner/Möller

Slicing / Cutting: Spatial Data

• easy to understand metaphor• reduces data

from 3D to 2D

10

Page 11: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted

© Munzner/Möller

Axis-aligned slices

11http://rsbweb.nih.gov/ij/plugins/volume-viewer.html

Page 12: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted

© Munzner/Möller

HyperSlice: slicing in multi-d

• HyperSlice: matrix of orthogonal 2D slices– each panel is display and control: drag to

change slice– simple 3D example

12

Page 13: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted

© Munzner/Möller

Example

• 4D function Sum(i=0...3) wi/(1 + |x − pi|2)• diagonals = standard graph

13

Page 14: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted

© Munzner/Möller

(orthographic) Projections

• (axis-aligned) orthographic: remove all information about filtered dims– hypercube: 3D to 2D, 4D to 3D (video)

14

Page 15: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted

© Munzner/Möller

SPloM: orthographic projections in multi-d

15

Page 16: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted

© Munzner/Möller 16

Oblique (parallel) projections

• Projectors are not normal to projection plane

• Most drawings in textbooks use oblique projection

Page 17: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted

© Munzner/Möller 17

Common oblique projections

• Cavalier projections– Angle α = 45 degrees– Preserves the length of a line segment

perpendicular to the projection plane– Angle φ is typically 30 or 45 degrees

• Cabinet projections– Angle α = 63.7 degrees or arctan(2)– Halves the length of a line segment

perpendicular to the projection plane – more realistic than cavalier

Page 18: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted

© Munzner/Möller

(perspective) Projections

• some info about filtered dims remains• mimics human visual system in 3D• not as effective in multi-D

18

Page 19: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted

© Munzner/Möller

Today - Attribute reduction

• simple attribute filtering• camera-oriented:

– slice– cut– project

• aggregation: dimensionality reduction– linear– non-linear

19

Page 20: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted

© Munzner/Möller

Dimensionality vs. attribute reduction

• vocab use in field not consistent– dimension/attribute

• attribute reduction: reduce set with filtering– includes orthographic projection

• dimensionality reduction: create smaller set of new dims– set size is smaller than original, new dims

completely synthetic– clarification: includes dimensional aggregation– includes some projections (but not all)

• vocab: projection/mapping20

Page 21: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted

© Munzner/Möller

Uncovering Hidden Structure

• measurements indirect not direct– real-world sensor limitations

• measurements made in sprawling space– documents, images

• DR only suitable if (almost) all information could be conveyed with fewer dimensions– how do you know? need to estimate true

dimensionality to check if different than original! 21

Page 22: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted

© Munzner/Möller

Dimensionality Reduction• mapping multidimensional space into

space of fewer dimensions– typically 2D for clarity– 1D/3D possible?– keep/explain as much variance as possible– show underlying dataset structure

• motivation: true/intrinsic dimension of dataset is much lower than measured dim

• linear vs. non-linear approaches22

Page 23: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted

© Munzner/Möller

Estimating True Dimensionality

• error for low-dim projection vs high-dim original

• no single correct answer; many metrics proposed– cumulative variance that is not accounted for– strain: match variations in distance (vs actual

distance values)– stress: difference between interpoint

distances in high and low dimensions

23

Page 24: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted

© Munzner/Möller

Showing Dimensionality Estimates

• scree plots as simple way: error against # dims– original dataset: 294 dims– estimate: almost all variance preserved with

< 20 dims

24

Page 25: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted

© Munzner/Möller

Linear DimRed• based on linear projections!• given dimensions have a strong meaning• want to preserve linearity in layout• examples: Principal component analysis (PCA),

Independent component analysis (ICA), Linear discriminant analysis (LDA), etc.

25

PCA LDA

Page 26: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted

© Munzner/Möller

Problem for linear dimred

26

Page 27: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted

© Munzner/Möller

Nonlinear DimRed

• no inherent meaning to given dimensions• minimize differences between interpoint

distances in high and low dimensions• examples: Multidimensional scaling

(MDS), isomap, local linear embedding (LLE)

27

Page 28: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted

© Munzner/Möller

Dimred: Isomap

• 4096 D to 2D [Tenenbaum 00]• 2D: wrist rotation, fingers extension

[A G

loba

l Geo

met

ric F

ram

ewor

k fo

r Non

linea

r Dim

ensi

onal

ity R

educ

tion.

Tene

nbau

m, d

e S

ilva

and

Lang

ford

. S

cien

ce 2

90 (5

500)

: 231

9-23

23, 2

2 D

ecem

ber 2

000,

is

omap

.sta

nfor

d.ed

u]

28

Page 29: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted

© Munzner/Möller

Goals / Tasks

• keep/explain as much variance as possible

• find clusters– or compare / evaluate with previous

clustering• understand structure

– absolute position not reliable– fine grained structure not reliable

29

Page 30: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted

© Munzner/Möller

Naive Spring Model• repeat for all points

– compute spring force to all other points• difference between high dim vs. low dim distance (stress)

– move to better location using computed forces• compute distances between all points

– O(n2) iteration, O(n3) algorithm

30

Page 31: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted

© Munzner/Möller

Faster Spring Model [Chalmers 96]

• compare distances only with a few points– maintain small local neighborhood set

31

Page 32: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted

© Munzner/Möller

• compare distances only with a few points– maintain small local neighborhood set– each time pick some randoms, swap in if

closer

Faster Spring Model [Chalmers 96]

32

Page 33: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted

© Munzner/Möller

• compare distances only with a few points– maintain small local neighborhood set– each time pick some randoms, swap in if

closer

Faster Spring Model [Chalmers 96]

33

Page 34: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted

© Munzner/Möller

• compare distances only with a few points– maintain small local neighborhood set– each time pick some randoms, swap in if closer

• small constant: 6 locals, 3 randoms typical– O(n) iteration, O(n2) algorithm

Faster Spring Model [Chalmers 96]

34

Page 35: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted

© Munzner/Möller

True/Intrinsic Dimensionality

• how many dimensions is enough?– could be more than 2 or 3– knee in scree plot

• example– measured materials

from graphics– linear PCA: 25– get physically

impossible intermediate points

35

A Data-Driven Reflectance Model, SIGGRAPH 2003,W. Matusik, H. Pfister, M. Brand and L. McMillan, people.csail.mit.edu/wojciech/DDRM/ddrm.pdf

Page 36: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted

© Munzner/Möller

True/Intrinsic Dimensionality

• nonlinear MDS: 10-15– all intermediate points possible

• they have intuitive meaning– red, green, blue,

specular, diffuse, glossy, metallic, plastic-y, roughness, rubbery, greasiness, dustiness...

36

A Data-Driven Reflectance Model, SIGGRAPH 2003,W. Matusik, H. Pfister, M. Brand and L. McMillan, people.csail.mit.edu/wojciech/DDRM/ddrm.pdf

Page 37: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted

© Munzner/Möller

Showing DR Data• scatterplot showing points

– only works if true dimensionality is 2 (... or 3)– need to drill down to see what points represent

• SPLOM– safe choice

• landscapes– avoid! studies show worse than just using points

37