Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Recent developments in direct labeled graphicshttp://directlabels.r-forge.r-project.org
Toby Dylan [email protected]
29 March 2012
Motivation: confusing legends
How to add direct labels to some common plots
Recent developments in direct labeling
Conclusions
Problem 1: legend inconsistent with data
library(lattice)
dens <- densityplot(~score,loci,groups=type,
auto.key=list(space="top",columns=3),n=500,
main="Distribution of scores by selection type")
print(dens)
Distribution of scores by selection type
score
Density
0
2
4
6
8
10
0.0 0.5 1.0
Balancing Neutral Positive
Problem 2: too many classes render legend unreadable
data(BodyWeight,package="nlme")
library(ggplot2)
ratplot <- ggplot(BodyWeight,aes(Time,weight,colour=Rat))+
facet_grid(.~Diet)+
geom_line()
print(ratplot)
1 2 3
300
400
500
600
0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60Time
weigh
t
Rat
2
3
4
1
8
5
6
7
11
9
10
12
13
15
14
16
Motivation: confusing legends
How to add direct labels to some common plots
Recent developments in direct labeling
Conclusions
The protocol I use for everyday plots in practice:
Do as many steps as needed until the plot is readable:
1. Make a lattice or ggplot2 plot p using colors and default legends.
2. Try the default direct labels: direct.label(p).
3. Check to see if another Positioning Method exists onhttp://directlabels.r-forge.r-project.org/docs/index.html
then use direct.label(p,"method").
4. If no Positioning Methods exist you can always write your own.
Add default direct labels at the mode of each density
library(directlabels)
direct.label(dens)
Distribution of scores by selection type
score
Density
0
2
4
6
8
10
0.0 0.5 1.0
Balancing
Neutral
Positive
With 2 groups, we label the min and max points
p <- xyplot(deaths~Time,uk.lung,
groups=sex,type=c("l","g"))
direct.label(p)
Time
deaths
500
1000
1500
2000
2500
1974 1975 1976 1977 1978 1979 1980
female
male
Label a scatterplot of the iris data by species
set.seed(1)
irisp <- xyplot(jitter(Sepal.Length)~jitter(Petal.Length),
iris,groups=Species)
direct.label(irisp)
jitter(Petal.Length)
jitter(Sepal.Length)
5
6
7
8
1 2 3 4 5 6 7
setosa
virginica
versicolor
Default direct labels for lineplots
direct.label(ratplot)
1 2 3
300
400
500
600
21385467
11910
12
13
161514
0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60Time
weigh
t
Look up the Positioning Method on the directlabels website
direct.label(ratplot,"last.qp")
1 2 3
300
400
500
600
23418567
11910
12
13
151416
0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60Time
weigh
t
Construct your own custom Positioning Method
rp2 <- ratplot+
xlim(0,70)+ylim(150,650)
big.last <- list(cex=1.5,"last.qp")
direct.label(rp2,"big.last")
1 2 3
200
300
400
500
600
23418567
11910
12
13151416
0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70Time
weigh
t
Direct label the LASSO path to visualize variables
lasso.plot <-
ggplot(path,aes(arclength,standardized.coef,colour=variable))+
geom_line(aes(group=variable))+
opts(title="LASSO path for prostate cancer data
calculated using the LARS")+
xlim(0,20)
direct.label(lasso.plot)
-2
0
2
4
6
lcp
age
gleason
lbphpgg45lweightsvi
lcavol
0 5 10 15 20
LASSO path for prostate cancer data calculated using the LARS
arclength
standardized.coef
Label the zero point to emphasize variable importance
direct.label(lasso.plot,"lasso.labels")
-2
0
2
4
6
lcavol
lweight
svi
lbph
pgg45
age
lcp
gleason
0 5 10 15 20
LASSO path for prostate cancer data calculated using the LARS
arclength
standardized.coef
Increase text size to make reading easier
direct.label(lasso.plot,list(cex=2,"lasso.labels"))
-2
0
2
4
6
lcavol
lweight
svi
lbph
pgg45
age
lcp
gleason
0 5 10 15 20
LASSO path for prostate cancer data calculated using the LARS
arclength
standardized.coef
Motivation: confusing legends
How to add direct labels to some common plots
Recent developments in direct labeling
Conclusions
Changes in recent versions of directlabels
directlabels version < 2.0 < 2.0 ≥ 2.0 ≥ 2.0plotting package lattice ggplot2 lattice ggplot2
basic Positioning Methods X X X Xsmart Positioning Methodsthat avoid label collisions X X X
redraw labelsafter window resize X X
fontface and fontfamily
text parameters X X Xlabel black and white plots X X X
label aestheticsother than color X
Label a scatterplot of the iris data by species
direct.label(irisp)
jitter(Petal.Length)
jitter(Sepal.Length)
5
6
7
8
1 2 3 4 5 6 7
setosa
virginica
versicolor
Show the grid for the searchFind a label position on a grid (grey rectangles)that is near the center of each point cloud (green dots),but does not overlap any points or other labels (black dots).
direct.label(irisp,debug=TRUE)
jitter(Petal.Length)
jitter(Sepal.Length)
5
6
7
8
1 2 3 4 5 6 7
setosa
virginica
versicolor
Increase text size before calculating label positions
rp2 <- ratplot+
xlim(0,70)+ylim(150,650)
big.last <- list(cex=1.5,"last.qp")
direct.label(rp2,"big.last")
1 2 3
200
300
400
500
600
23418567
11910
12
13151416
0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70Time
weigh
t
Show the label borders used in the position calculation
direct.label(rp2,
list("big.last",
"calc.boxes",
"draw.rects"))
1 2 3
200
300
400
500
600
23418567
11910
12
13151416
0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70Time
weigh
t
Label positions for lineplots are the solutions of a QP
t1
t2
tk−1
tk
...
h1
h2
hk−1
hk
...
Assume that for each text label i = 1, . . . , k we haveits position ti and height hi .Then optimal direct labels do not overlap, and areas close as possible to the target locations:
minb∈Rk
k∑i=1
(bi − ti )2 = ||b − t||2
subject to bi+1 ≥ bi + hi+1/2 + hi/2, ∀ i = 1, . . . , k − 1
This is a quadratic program (QP). QPs are convex sothere is a unique global minimum which correspondsto the best labels.We can solve this using quadprog::solve.QP()
and use the optimal b for the direct label positions.
Start with boxes at the end of each line
direct.label(rp2,list("last.points",cex=1.5,
"calc.boxes",
"draw.rects"))
1 2 3
200
300
400
500
600
12345678
1011
12
9
13141516
0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70Time
weigh
t
Adjust box height if desired
direct.label(rp2,list("last.points",cex=1.5,"calc.boxes",
dl.trans(h=h+h/3),"calc.borders",
"draw.rects"))
1 2 3
200
300
400
500
600
12345678
1011
12
9
13141516
0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70Time
weigh
t
Apply QP solver to get optimal labels
direct.label(rp2,list("last.points",cex=1.5,"calc.boxes",
dl.trans(h=h+h/3),"calc.borders",
qp.labels("y","h")))
1 2 3
200
300
400
500
600
23418567
11910
12
13151416
0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70Time
weigh
t
Motivation: confusing legends
How to add direct labels to some common plots
Recent developments in direct labeling
Conclusions
What directlabels is NOT
I Automatic publication-quality direct labels.(some manual tweaking will always be necessary)
I Optimal labels for individual points on scatterplots.(it is a bit more complicated)
-20 -10 0 10 20
-20
-10
010
20
x
y
-5.02192
1.31531
-0.78917
8.86785
1.16971
3.1863
-5.81791
7.14533
-8.25259
-3.59862
0.898860.96274
-2.01634
7.3984
1.2338
-0.29317
-3.88854
5.10856
-9.13814 23.10297
-4.3809
7.640612.61961
7.73405
-8.14379
-4.38451
-7.20222
2.30945
-11.57729
2.47076
-0.91114
17.57376
-1.3793
-1.11193
-6.90014
-2.21794
1.82908
4.17323
10.65402
9.70202
-1.01629
14.03203
-17.76776
6.22867
-5.22283
13.22231
-3.6344
13.19066
0.43779-18.78656
Use directlabels instead of confusing legends!
I Works with lattice and ggplot2.
I Sensible defaults.
I Useful in everyday plots in practice.
I Smart Positioning Methods that avoid label collisions.
I Customizable: you can write your own Positioning Methods.
Future work
I Automatically adjust xlim/ylim so labels stay on plot region?
I Contourplot labels as in contour()?
I Label using images instead of textual factor names?Possible Google Summer of Code 2012 project:http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2012