37
IRDS: Visualization Charles Sutton University of Edinburgh

visualization - The University of Edinburgh · Histograms 4 6 8 10 12 14 16 0 20 60 100 8 10 12 14 16 18 20 0 40 80 120 skew multimodality 6 8 10 12 14 0 20 40 60 80 these three have

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • IRDS: VisualizationCharles Sutton

    University of Edinburgh

  • Why visualisation?

    • Exploratory • What’s in the data? What’s wrong with it? • Today’s lecture.

    • Presentation • Display results of algorithms for publication

    • Engagement • Infographics from web sites, word clouds, etc

  • Summary Statistics

    • Univariate • Mean • Median • Variance • Quantiles

    • Multivariate • Correlation • Covariance

  • Histograms

    4 6 8 10 12 14 16

    020

    60100

    8 10 12 14 16 18 20

    040

    80120

    skew multimodality

    6 8 10 12 14

    020

    4060

    80these three have same summary statistics!

  • Outliers in histograms

    Data Mining Lectures Lecture on EDA and Visualization Padhraic Smyth, UC Irvine

    Histogram Detecting Outlier (Missing Data)

    blood pressure = 0 ?

    Blood pressure data set

    [Source: Padhraic Smyth]UCI ML repository says no missing data

    (well, for 20 years it did)

  • Class-Conditional Histograms

    Alternative: Box plot

    Positive (diabetes)

    Negative

    Blood Pressure

    Freq

    uenc

    y

    0 20 40 60 80 100

    020

    4060

    80

    Blood Pressure

    Freq

    uenc

    y

    0 20 40 60 80 100

    050

    100

    150

    ●●●

    ●●●●

    ●●●●●●

    ●●

    ●●

    ●●

    ●●●●●●●●●●●●●

    neg pos

    020

    4060

    80100120

    Diabetes?Pressure Median

    Quartile

    Quartile

    Extreme data

    Maybe for only 2 groups, graphs not necessary. For more visual comparisons, can be helpful.

  • Slight rant about bar charts

    Here’s my boxplot

    ●●●

    ●●●●

    ●●●●●●

    ●●

    ●●

    ●●

    ●●●●●●●●●●●●●

    neg pos

    020

    4060

    80100120

    Diabetes?

    Pressure

    Weka’s automatic visualisation

    Bar charts often seem like a better idea than they are

  • Effect of bin size

    0 10 20 30 40 500

    1020

    3040

    5060

  • Effect of bin size

    0 10 20 30 40 500

    1020

    3040

    5060

    0 10 20 30 40 50

    05

    1015

    2025

    3035

  • Effect of bin size

    0 10 20 30 40 50

    05

    1015

    2025

    3035

    0 10 20 30 40 500

    510

    1520

    2530

  • More misleading histograms

    Data Mining Lectures Lecture on EDA and Visualization Padhraic Smyth, UC Irvine

    ZipCode Data: Population

    0 2 4 6 8 10 12

    x 104

    0

    1000

    2000

    3000

    4000

    5000

    6000

    7000

    8000

    0 2 4 6 8 10 12

    x 104

    0

    100

    200

    300

    400

    500

    600

    700

    800

    900

    0 500 1000 1500 2000 2500 3000 3500 4000 4500 50000

    50

    100

    150

    200

    250

    300

    350

    400

    [Source: Padhraic Smyth]Data: US Post Codes

  • Bivariate data

    • Numerical summaries about linear dependence • Histograms sort of scale to 2-D but not really higher • More common to use scatter plots

  • Numerical bivariate summaries

    Sample covariance:

    Sample correlation:

    ⇢xy

    =sxy

    sx

    sy

    where as beforex̄ =

    1

    N

    X

    i

    xi

    ȳ =1

    N

    X

    i

    yi

    s

    x

    =

    s1

    N � 1X

    i

    (xi

    � x̄)

    sy =

    s1

    N � 1X

    i

    (yi � ȳ)

    Data are (x1, y1), (x2, y2), . . . (xN , yN )

    s

    xy

    =1

    N � 1

    NX

    i=1

    (yi

    � ȳ)(xi

    � x̄)

  • Dangers of correlation

    4 6 8 10 12 14

    46

    810

    14

    4 6 8 10 12 14

    46

    810

    14

    4 6 8 10 12 14

    46

    810

    14

    8 10 12 14 16 18

    46

    810

    14

    [Anscombe, 1973]

  • Scatterplots

    ●●

    ● ●

    ● ●

    ●●

    ● ●

    ●●

    ●●

    −2 −1 0 1 2 3

    −2−1

    01

    2

    x1

    x2

  • Overplotting●

    ●●

    ● ●

    ● ●

    ●●

    ● ●

    ●●

    ●●

    −2 −1 0 1 2 3

    −2−1

    01

    2

    x1

    x2

    100 data points

    ●●

    ●●

    ●●●

    ● ●

    ●● ●

    ●●

    ● ●

    ●●

    ●●●

    ●●

    ●●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●●

    ●● ●

    ●●

    ●●

    ● ●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ● ●

    ●●

    ●●

    ●●

    ● ●●

    ●●

    ●●

    ●●

    ● ●

    ● ●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●● ●

    ●●

    ●●

    ●●● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    −3 −2 −1 0 1 2 3

    −3−2

    −10

    12

    3

    x1

    x2

    1000 data points

    ● ●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ● ●

    ●●

    ●●

    ● ●

    ●●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●●

    ● ●

    ●●

    ● ●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ● ●

    ●●

    ●●

    ● ●

    ●●

    ● ●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●● ●●●

    ●●

    ●●

    ●●

    ● ●

    ● ●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ● ●

    ●●

    ● ●

    ● ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●●

    ●●

    ● ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ● ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ● ●

    ● ●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●●●

    ●●●

    ●●●

    ● ●

    ●●

    ● ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ● ●

    ●●●

    ●●

    ●●

    ● ●●

    ● ●

    ●●

    ●●

    ● ●

    ●●

    ●●●

    ●● ●

    ●●

    ● ●

    ●●

    ● ●

    ●●

    ● ●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ● ●

    ●●

    ●●

    ● ●

    ● ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ● ●●

    ●●

    ●●

    ●●●●

    ● ●●

    ● ●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●● ●

    ● ●

    ●●●

    ●●

    ●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ● ●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ● ●

    ● ●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●● ●

    ●●●

    ●●

    ● ●

    ●●

    ●●

    ● ●

    ●●●

    ●●

    ●●

    ●● ●

    ●●●

    ●●

    ●●

    ●●

    ●● ●

    ●●

    ●● ●

    ●●

    ●●

    ● ●

    ● ●

    ●●

    ●●● ●

    ●●

    ● ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●● ●

    ●●

    ●●

    ● ●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ● ●

    ●●

    ● ●

    ● ●● ●

    ● ●●●

    ●●

    ●●

    ●●

    ● ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ● ●

    ● ● ●●

    ●● ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ● ●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ● ●

    ●●

    ●●

    ●●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●● ●

    ● ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●● ●●

    ●●

    ● ●

    ●● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ● ●

    ● ●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ● ●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ● ●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ● ●

    ● ●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●●

    ●● ●

    ●●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ● ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ● ●●● ●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●● ●

    ● ●

    ●●

    ● ●

    ● ●

    ●●

    ●●

    ●●

    ● ●● ●

    ●●

    ● ●

    ●●

    ● ●

    ●●

    ● ●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●● ●

    ●●

    ● ●

    ●●

    ● ●

    ●●

    ●●

    ● ●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ● ●

    ●●

    ●●

    ●●●

    ●●

    ● ●

    ● ●

    ●●

    ●●

    ● ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ● ●

    ●●

    ● ●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ● ●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ● ●

    ●● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ● ●

    ● ●●

    ●●

    ●●

    ●●●

    ● ●●

    ●●

    ●●

    ● ●

    ●●●

    ●●

    ●●●

    ●●

    ● ●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ● ●● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ● ●

    ● ●

    ●●

    ● ●

    ● ●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●● ●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●●

    ●●

    ● ●●

    ●●

    ●●

    ● ●

    ● ●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ● ●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ● ●●

    ●●

    ●●

    ●●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●● ●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●

    ● ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ● ●

    ●●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ● ●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ● ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●● ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ● ●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ● ●

    ●●

    ●●

    ● ●

    ●●

    ● ●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●● ●

    ● ●

    ●●

    ●●

    ● ●●

    ●●

    ●●

    ● ●

    ● ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●