33
Visualizatio n and Data Mining

Visualization and Data Mining

  • Upload
    ira

  • View
    27

  • Download
    3

Embed Size (px)

DESCRIPTION

Visualization and Data Mining. Napoleon Invasion of Russia, 1812. Napoleon. Marley, 1885. Snow’s Cholera Map, 1855. Asia at night. South and North Korea at night. North Korea Notice how dark it is. Seoul, South Korea. Visualization Role. Support interactive exploration - PowerPoint PPT Presentation

Citation preview

Page 1: Visualization  and Data Mining

Visualization and

Data Mining

Page 2: Visualization  and Data Mining

22

Napoleon Invasion of Russia, 1812

Napoleon

Page 3: Visualization  and Data Mining

33

Marley, 1885

Page 4: Visualization  and Data Mining

44

Page 5: Visualization  and Data Mining

55

Snow’s Cholera Map, 1855

Page 6: Visualization  and Data Mining

66

Asia at night

Page 7: Visualization  and Data Mining

77

South and North Korea at night

Seoul,South Korea

North KoreaNotice how darkit is

Page 8: Visualization  and Data Mining

88

Visualization Role Support interactive exploration Help in result presentation Disadvantage: requires human eyes Can be misleading

Page 9: Visualization  and Data Mining

99

Bad Visualization: Spreadsheet with misleading Y -axisYear Sales

1999 21102000 21052001 21202002 21212003 2124

Sales

20952100210521102115212021252130

1999 2000 2001 2002 2003

Sales

Y-Axis scale gives WRONGimpression of big change

Page 10: Visualization  and Data Mining

1010

Better VisualizationYear Sales

1999 21102000 21052001 21202002 21212003 2124

Sales

0

500

1000

1500

2000

2500

3000

1999 2000 2001 2002 2003

Sales

Axis from 0 to 2000 scale gives correct impression of small change

Page 11: Visualization  and Data Mining

1111

Lie Factor=14.8

(E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)(E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)

Page 12: Visualization  and Data Mining

1212

Lie Factor

dataineffectofsizegraphicinshowneffectofsizeFactorLie

8.14528.0833.7

18)0.185.27(

6.0)6.03.5(

Tufte requirement: 0.95<Lie Factor<1.05

(E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)(E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)

Page 13: Visualization  and Data Mining

1313

Tufte’s Principles of Graphical Excellence Give the viewer

the greatest number of ideas in the shortest time with the least ink in the smallest space.

Tell the truth about the data!

(E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)(E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)

Page 14: Visualization  and Data Mining

1414

Visualization Methods Visualizing in 1-D, 2-D and 3-D

well-known visualization methods Visualizing more dimensions

Parallel Coordinates Other ideas

Page 15: Visualization  and Data Mining

1515

1-D (Univariate) Data Representations

7

5

3

1

0 20

Mean

low highMiddle 50%

Tukey box plot

Histogram

Page 16: Visualization  and Data Mining

1616

2-D (Bivariate) Data Scatter plot, …

price

mileage

Page 17: Visualization  and Data Mining

1717

3-D Data (projection)

price

Page 18: Visualization  and Data Mining

1818

3-D image (requires 3-D blue and red glasses)

Taken by Mars Rover Spirit, Jan 2004

Page 19: Visualization  and Data Mining

1919

Visualizing in 4+ Dimensions Scatterplots Parallel Coordinates Chernoff faces …

Page 20: Visualization  and Data Mining

2020

Multiple Views

Give each variable its own display

A B C D E1 4 1 8 3 52 6 3 4 2 13 5 7 2 4 34 2 6 3 1 5

A B C D E

1

2

3

4

Problem: does not show correlations

Page 21: Visualization  and Data Mining

2121

Scatterplot Matrix

Represent each possiblepair of variables in theirown 2-D scatterplot (car data)

Q: Useful for what? A: linear correlations (e.g. horsepower & weight)

Q: Misses what? A: multivariate effects

Page 22: Visualization  and Data Mining

2222

Parallel Coordinates • Encode variables along a horizontal row• Vertical line specifies values

Dataset in a Cartesian coordinates

Same dataset in parallel coordinates

Invented by Alfred Inselberg while at IBM, 1985

Page 23: Visualization  and Data Mining

2323

Example: Visualizing Iris Data

sepal length

sepal width

petal length

petal width

5.1 3.5 1.4 0.2

4.9 3 1.4 0.2

... ... ... ...

5.9 3 5.1 1.8

Iris setosa

Iris versicolor

Iris virginica

Page 24: Visualization  and Data Mining

2424

Flower PartsPetal, a non-reproductive part of the flower

Sepal, a non-reproductive part of the flower

Page 25: Visualization  and Data Mining

2525

Parallel Coordinates Sepal Length

5.1

sepal length

sepal width

petal length

petal width

5.1 3.5 1.4 0.2

Page 26: Visualization  and Data Mining

2626

Parallel Coordinates: 2 DSepal Length

5.1

Sepal Width

3.5

sepal length

sepal width

petal length

petal width

5.1 3.5 1.4 0.2

Page 27: Visualization  and Data Mining

2727

Parallel Coordinates: 4 DSepal Length

5.1

Sepal Width

Petal length

Petal Width

3.5

sepal length

sepal width

petal length

petal width

5.1 3.5 1.4 0.2

1.4 0.2

Page 28: Visualization  and Data Mining

2828

5.1

3.5

1.4 0.2

Parallel Visualization of Iris data

Page 29: Visualization  and Data Mining

2929

Parallel Visualization Summary Each data point is a line Similar points correspond to similar lines Lines crossing over correspond to negatively correlated attributes

Interactive exploration and clustering

Problems: order of axes, limit to ~20 dimensions

Page 30: Visualization  and Data Mining

3030

Chernoff FacesEncode different variables’ values in characteristicsof human face

http://www.cs.uchicago.edu/~wiseman/chernoff/http://hesketh.com/schampeo/projects/Faces/chernoff.html

Cute applets:

Page 31: Visualization  and Data Mining

3131

Interactive Face

Page 32: Visualization  and Data Mining

3232

Chernoff faces, example

Page 33: Visualization  and Data Mining

3333

Visualization Summary Many methods Visualization is possible in more than 3-D Aim for graphical excellence