40
Hadley Wickham Stat405 Polishing graphics for presentation Thursday, 21 October 2010

17 polishing

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: 17 polishing

Hadley Wickham

Stat405Polishing graphics for presentation

Thursday, 21 October 2010

Page 2: 17 polishing

Mark Schoenhals

• Rice alum & former stat major

• Visiting on Monday. Two talks: 11am (Four designs for ecommerce experiments), 4pm (Using data smarter faster: How one small store sold $500 million of music gear to over 1 million customers)

• Undergrads are invited to lunch with him. Email me if you’re interested

Thursday, 21 October 2010

Page 3: 17 polishing

# Who is the most accurate shooter in the NBA?

library(plyr)

nba <- read.csv("nba-0809.csv.bz2")shots <- subset(nba, etype == "shot")

success <- ddply(shots, c("team", "player"), summarise, total = length(player), made = sum(result == "made"))success$prop <- success$made / success$totalsuccess <- arrange(success, desc(prop))

Thursday, 21 October 2010

Page 4: 17 polishing

team player total made prop1 NYK Eddy Curry 1 1 1.00000002 OKC Steven Hill 1 1 1.00000003 SAS Marcus Williams 2 2 1.00000004 CHA Dwayne Jones 4 3 0.75000005 OKC Mouhamed Sene 7 5 0.71428576 SAS Pops Mensah-Bonsu 7 5 0.71428577 BOS J.R. Giddens 3 2 0.66666678 LAL Yue Sun 3 2 0.66666679 MIL Eddie Gill 9 6 0.666666710 DAL Erick Dampier 269 175 0.650557611 LAC DeAndre Jordan 132 85 0.643939412 ORL Adonal Foyle 11 7 0.636363613 BOS Bill Walker 58 36 0.620689714 POR Joel Przybilla 261 161 0.616858215 POR Shavlik Randolph 13 8 0.615384616 DET Amir Johnson 152 92 0.605263217 PHX Shaquille O'Neal 813 491 0.603936018 DEN Nene Hilario 708 427 0.603107319 ATL Solomon Jones 93 56 0.602150520 BOS Mikki Moore 85 51 0.6000000

Thursday, 21 October 2010

Page 5: 17 polishing

total

prop

0.0

0.2

0.4

0.6

0.8

1.0 ●●●

●●

●●●●●●

● ●●● ●●●●●● ●

● ●●●● ●● ● ●●●●● ●● ●● ●● ●●● ●●● ●●● ●●● ●● ● ●● ●● ●●● ● ● ●●●●● ●● ● ●●● ● ●● ●● ●●● ● ●● ●●● ●●●●●● ●●●●●● ●●●● ●●● ● ●● ●●● ● ●●● ●● ●●● ● ●● ●● ●●● ●● ●●● ●●● ●● ● ●● ●●● ● ● ●●● ●●●● ●● ●●● ● ●● ●●● ● ●●●● ● ●● ● ●●● ●●● ●● ● ●● ●● ●●●● ●● ●●● ●●● ● ●●● ●● ● ●● ●●●● ●● ● ●● ● ● ●●● ●● ●●● ●●●● ●● ● ●● ●● ● ●●● ● ● ●● ● ●●● ● ●● ●● ●●● ● ● ●● ●● ● ● ●● ● ● ● ●●●● ●● ● ● ●● ●●● ●● ●● ● ● ●● ●●● ● ●● ●● ● ●●● ● ●●●●● ●●● ●●● ●● ●● ●●● ● ●●●● ●● ●● ● ●●● ● ●●●● ●● ●● ● ● ●● ●● ● ●● ●●● ● ●●● ●●● ●● ●● ●● ●●●●● ●●● ●●● ●●●●●● ●● ●● ●● ●●● ●● ● ●●● ●● ●●● ● ●●● ● ● ● ●● ●●●● ●● ●●●●● ●●● ●● ●●● ●●● ● ●●●● ● ●● ●● ●● ●● ●●● ●●●●●●●● ●● ●●● ●●● ●●●● ●● ● ●● ●●● ●● ●●● ●●●●● ●●●● ●●●●●

●●●●●

500 1000 1500

Thursday, 21 October 2010

Page 6: 17 polishing

1. ggplot() practice

2. Communication graphics

3. Polishing a plot: scales and themes

Thursday, 21 October 2010

Page 7: 17 polishing

25

30

35

40

45

50

●●

●●

●●

●●

●● ●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●●

●●

●●

●● ●

●●

●●

●●

●●

−120 −110 −100 −90 −80 −70

% cancelled● 0.0● 0.2

● 0.4

● 0.6

● 0.8

● 1.0

Thursday, 21 October 2010

Page 8: 17 polishing

Identify the data and layers in the flight delays data, then write the ggplot2 code to create it.

library(ggplot2)

library(maps)

usa <- map_data("state")

feb13 <- read.csv("delays-feb-13-2007.csv")

Your turn

Thursday, 21 October 2010

Page 9: 17 polishing

ggplot(feb13, aes(long, lat)) + geom_point(aes(size = 1), colour = "white") + geom_polygon(aes(group = group), data = usa, colour = "grey70", fill = NA) + geom_point(aes(size = ncancelw / ntot), colour = alpha("black", 1/2))

# Polishing: up nextlast_plot() + scale_area("% cancelled", to = c(1, 8), breaks = seq(0, 1, by = 0.2), limits = c(0, 1)) scale_x_continuous("", limits = c(-125, -67)), scale_y_continuous("", limits = c(24, 50))

Thursday, 21 October 2010

Page 10: 17 polishing

When you need to communicate your findings, you need to spend a lot of time polishing your graphics to eliminate distractions and focus on the story.

Now it’s time to pay attention to the small stuff: labels, colour choices, tick marks...

Communication graphics

Thursday, 21 October 2010

Page 11: 17 polishing

ContextThursday, 21 October 2010

Page 12: 17 polishing

ConsumptionThursday, 21 October 2010

Page 13: 17 polishing

long

lat

26

28

30

32

34

36

−106 −104 −102 −100 −98 −96 −94

bin< 1000< 1e4< 1e5< 1e6< 1e7

What’s wrong with this plot?

Thursday, 21 October 2010

Page 14: 17 polishing

Some problems

Incorrect coordinate system

Bad colour scheme

Unnecessary axis labels

Legend needs improvement: better title and better key labels

No title

Thursday, 21 October 2010

Page 15: 17 polishing

Thursday, 21 October 2010

Page 16: 17 polishing

1. Scales: used to override default perceptual mappings, and tune parameters of axes and legends.

2. Themes: control presentation of non-data elements.

3. Saving your work: to include in reports, presentations, etc.

Thursday, 21 October 2010

Page 17: 17 polishing

Scales

Thursday, 21 October 2010

Page 18: 17 polishing

ScalesControl how data is mapped to perceptual properties, and produce guides (axes and legends) which allow us to read the plot.

Important parameters: name, breaks & labels, limits.

Naming scheme: scale_aesthetic_name. All default scales have name continuous or discrete.

Thursday, 21 October 2010

Page 19: 17 polishing

# Default scalesscale_x_continuous()scale_y_discrete()scale_colour_discrete()

# Custom scalesscale_colour_hue() scale_x_log10()scale_fill_brewer()

# Scales with parametersscale_x_continuous("X Label", limits = c(1, 10))scale_colour_gradient(low = "blue", high = "red")

Thursday, 21 October 2010

Page 20: 17 polishing

# First argument (name) controls axis labelscale_y_continuous("Latitude")scale_x_continuous("")

# Breaks and labels control tick marksscale_x_continuous(breaks = -c(106,100,94))scale_fill_discrete(labels = c("< 1000" = "< 1000", "< 1e4" = "< 10,000", "< 1e5" = "< 100,000", "< 1e6" = "< 1,000,000", "< 1e7" = "1,000,000+"))scale_y_continuous(breaks = NA)

# Limits control range of datascale_y_continuous(limits = c(26, 32))# same as:p + ylim(26, 32)

Thursday, 21 October 2010

Page 21: 17 polishing

options(stringsAsFactors = FALSE)pop <- read.csv("tx-pop.csv")pop$bin <- cut(log10(pop$pop), breaks = 2:7, labels = c("< 1000", "< 1e4", "< 1e5", "< 1e6", "< 1e7"))borders <- read.csv("tx-borders.csv")choro <- join(borders, pop)

qplot(long, lat, data = choro, geom = "polygon", group = group, fill = bin)

Thursday, 21 October 2010

Page 22: 17 polishing

Fix the axis and legend related problems that we have identified.

Your turn

Thursday, 21 October 2010

Page 23: 17 polishing

qplot(long, lat, data = choro, geom = "polygon", group = group, fill = bin) + scale_fill_discrete("Population", labels = c("< 1000" = "< 1000" , "< 1e4" = "< 10,000", "< 1e5" = "< 100,000", "< 1e6" = "< 1,000,000", "< 1e7" = "1,000,000+")) + scale_x_continuous("") + scale_y_continuous("") + coord_map()

Thursday, 21 October 2010

Page 24: 17 polishing

Can also override the default choice of scales. You are most likely to want to do this with colour, as it is the most important aesthetic after position.

Need a little background to be able to use colour effectively: colour spaces & colour blindness.

Alternate scales

Thursday, 21 October 2010

Page 25: 17 polishing

Colour spaces

Most familiar is rgb: defines colour as mixture of red, green and blue. Matches the physics of eye, but the brain does a lot of post-processing, so it’s hard to directly perceive these components.

A more useful colour space is hcl: hue, chroma and luminance

Thursday, 21 October 2010

Page 26: 17 polishing

chromalum

inance

hue

Thursday, 21 October 2010

Page 27: 17 polishing

Default colour scales

Discrete: evenly spaced hues of equal chroma and luminance. No colour appears more important than any other. Does not imply order.

Continuous: evenly spaced hues between two colours.

Thursday, 21 October 2010

Page 28: 17 polishing

Colour blindness

7-10% of men are red-green colour “blind”. (Many other rarer types of colour blindness)

Solutions: avoid red-green contrasts; use redundant mappings; test. I like color oracle: http://colororacle.cartography.ch

Thursday, 21 October 2010

Page 29: 17 polishing

Alternatives

Discrete: brewer, grey

Continuous: gradient2, gradientn

Thursday, 21 October 2010

Page 30: 17 polishing

Your turn

Modify the fill scale to use a Brewer colour palette of your choice. (Hint: you will need to change the name of the scale)

Use RColorBrewer::display.brewer.all to list all palettes.

Thursday, 21 October 2010

Page 31: 17 polishing

Themes

Thursday, 21 October 2010

Page 32: 17 polishing

Visual appearance

So far have only discussed how to get the data displayed the way you want, focussing on the essence of the plot.

Themes give you a huge amount of control over the appearance of the plot, the choice of background colours, fonts and so on.

Thursday, 21 October 2010

Page 33: 17 polishing

# Two built in themes. The default:qplot(carat, price, data = diamonds)

# And a theme with a white background:qplot(carat, price, data = diamonds) + theme_bw()

# Use theme_set if you want it to apply to every# future plot.theme_set(theme_bw())

# This is the best way of seeing all the default# optionstheme_bw()theme_grey()

Thursday, 21 October 2010

Page 34: 17 polishing

The plot theme also controls the plot title. You can change this for an individual plot by adding

opts(title = "My title")

Plot title

Thursday, 21 October 2010

Page 35: 17 polishing

Your turn

Add an informative title and see what the plot looks like with a white background.

Thursday, 21 October 2010

Page 36: 17 polishing

You can also make your own theme, or modify and existing.

Themes are made up of elements which can be one of: theme_line, theme_segment, theme_text, theme_rect, theme_blank

Gives you a lot of control over plot appearance.

Elements

Thursday, 21 October 2010

Page 37: 17 polishing

ElementsAxis: axis.line, axis.text.x, axis.text.y, axis.ticks, axis.title.x, axis.title.y

Legend: legend.background, legend.key, legend.text, legend.title

Panel: panel.background, panel.border, panel.grid.major, panel.grid.minor

Strip: strip.background, strip.text.x, strip.text.y

Thursday, 21 October 2010

Page 38: 17 polishing

# To modify a plotp + opts(plot.title = theme_text(size = 12, face = "bold"))p + opts(plot.title = theme_text(colour = "red"))p + opts(plot.title = theme_text(angle = 45))p + opts(plot.title = theme_text(hjust = 1))

Thursday, 21 October 2010

Page 39: 17 polishing

# If we want, we could also remove the axes:last_plot() + opts( axis.text.x = theme_blank(), axis.text.y = theme_blank(), axis.title.x = theme_blank(), axis.title.y = theme_blank(), axis.ticks.length = unit(0, "cm"), axis.ticks.margin = unit(0, "cm"))

Thursday, 21 October 2010

Page 40: 17 polishing

Thursday, 21 October 2010