29
Recent developments in direct labeled graphics http://directlabels.r-forge.r-project.org Toby Dylan Hocking [email protected] 29 March 2012

Recent developments in direct labeled graphicsrug.mnhn.fr/semin-r/PDF/semin-R_directlabels_THocking...The protocol I use for everyday plots in practice: Do as many steps as needed

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Recent developments in direct labeled graphicsrug.mnhn.fr/semin-r/PDF/semin-R_directlabels_THocking...The protocol I use for everyday plots in practice: Do as many steps as needed

Recent developments in direct labeled graphicshttp://directlabels.r-forge.r-project.org

Toby Dylan [email protected]

29 March 2012

Page 2: Recent developments in direct labeled graphicsrug.mnhn.fr/semin-r/PDF/semin-R_directlabels_THocking...The protocol I use for everyday plots in practice: Do as many steps as needed

Motivation: confusing legends

How to add direct labels to some common plots

Recent developments in direct labeling

Conclusions

Page 3: Recent developments in direct labeled graphicsrug.mnhn.fr/semin-r/PDF/semin-R_directlabels_THocking...The protocol I use for everyday plots in practice: Do as many steps as needed

Problem 1: legend inconsistent with data

library(lattice)

dens <- densityplot(~score,loci,groups=type,

auto.key=list(space="top",columns=3),n=500,

main="Distribution of scores by selection type")

print(dens)

Distribution of scores by selection type

score

Density

0

2

4

6

8

10

0.0 0.5 1.0

Balancing Neutral Positive

Page 4: Recent developments in direct labeled graphicsrug.mnhn.fr/semin-r/PDF/semin-R_directlabels_THocking...The protocol I use for everyday plots in practice: Do as many steps as needed

Problem 2: too many classes render legend unreadable

data(BodyWeight,package="nlme")

library(ggplot2)

ratplot <- ggplot(BodyWeight,aes(Time,weight,colour=Rat))+

facet_grid(.~Diet)+

geom_line()

print(ratplot)

1 2 3

300

400

500

600

0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60Time

weigh

t

Rat

2

3

4

1

8

5

6

7

11

9

10

12

13

15

14

16

Page 5: Recent developments in direct labeled graphicsrug.mnhn.fr/semin-r/PDF/semin-R_directlabels_THocking...The protocol I use for everyday plots in practice: Do as many steps as needed

Motivation: confusing legends

How to add direct labels to some common plots

Recent developments in direct labeling

Conclusions

Page 6: Recent developments in direct labeled graphicsrug.mnhn.fr/semin-r/PDF/semin-R_directlabels_THocking...The protocol I use for everyday plots in practice: Do as many steps as needed

The protocol I use for everyday plots in practice:

Do as many steps as needed until the plot is readable:

1. Make a lattice or ggplot2 plot p using colors and default legends.

2. Try the default direct labels: direct.label(p).

3. Check to see if another Positioning Method exists onhttp://directlabels.r-forge.r-project.org/docs/index.html

then use direct.label(p,"method").

4. If no Positioning Methods exist you can always write your own.

Page 7: Recent developments in direct labeled graphicsrug.mnhn.fr/semin-r/PDF/semin-R_directlabels_THocking...The protocol I use for everyday plots in practice: Do as many steps as needed

Add default direct labels at the mode of each density

library(directlabels)

direct.label(dens)

Distribution of scores by selection type

score

Density

0

2

4

6

8

10

0.0 0.5 1.0

Balancing

Neutral

Positive

Page 8: Recent developments in direct labeled graphicsrug.mnhn.fr/semin-r/PDF/semin-R_directlabels_THocking...The protocol I use for everyday plots in practice: Do as many steps as needed

With 2 groups, we label the min and max points

p <- xyplot(deaths~Time,uk.lung,

groups=sex,type=c("l","g"))

direct.label(p)

Time

deaths

500

1000

1500

2000

2500

1974 1975 1976 1977 1978 1979 1980

female

male

Page 9: Recent developments in direct labeled graphicsrug.mnhn.fr/semin-r/PDF/semin-R_directlabels_THocking...The protocol I use for everyday plots in practice: Do as many steps as needed

Label a scatterplot of the iris data by species

set.seed(1)

irisp <- xyplot(jitter(Sepal.Length)~jitter(Petal.Length),

iris,groups=Species)

direct.label(irisp)

jitter(Petal.Length)

jitter(Sepal.Length)

5

6

7

8

1 2 3 4 5 6 7

setosa

virginica

versicolor

Page 10: Recent developments in direct labeled graphicsrug.mnhn.fr/semin-r/PDF/semin-R_directlabels_THocking...The protocol I use for everyday plots in practice: Do as many steps as needed

Default direct labels for lineplots

direct.label(ratplot)

1 2 3

300

400

500

600

21385467

11910

12

13

161514

0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60Time

weigh

t

Page 11: Recent developments in direct labeled graphicsrug.mnhn.fr/semin-r/PDF/semin-R_directlabels_THocking...The protocol I use for everyday plots in practice: Do as many steps as needed

Look up the Positioning Method on the directlabels website

direct.label(ratplot,"last.qp")

1 2 3

300

400

500

600

23418567

11910

12

13

151416

0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60Time

weigh

t

Page 12: Recent developments in direct labeled graphicsrug.mnhn.fr/semin-r/PDF/semin-R_directlabels_THocking...The protocol I use for everyday plots in practice: Do as many steps as needed

Construct your own custom Positioning Method

rp2 <- ratplot+

xlim(0,70)+ylim(150,650)

big.last <- list(cex=1.5,"last.qp")

direct.label(rp2,"big.last")

1 2 3

200

300

400

500

600

23418567

11910

12

13151416

0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70Time

weigh

t

Page 13: Recent developments in direct labeled graphicsrug.mnhn.fr/semin-r/PDF/semin-R_directlabels_THocking...The protocol I use for everyday plots in practice: Do as many steps as needed

Direct label the LASSO path to visualize variables

lasso.plot <-

ggplot(path,aes(arclength,standardized.coef,colour=variable))+

geom_line(aes(group=variable))+

opts(title="LASSO path for prostate cancer data

calculated using the LARS")+

xlim(0,20)

direct.label(lasso.plot)

-2

0

2

4

6

lcp

age

gleason

lbphpgg45lweightsvi

lcavol

0 5 10 15 20

LASSO path for prostate cancer data calculated using the LARS

arclength

standardized.coef

Page 14: Recent developments in direct labeled graphicsrug.mnhn.fr/semin-r/PDF/semin-R_directlabels_THocking...The protocol I use for everyday plots in practice: Do as many steps as needed

Label the zero point to emphasize variable importance

direct.label(lasso.plot,"lasso.labels")

-2

0

2

4

6

lcavol

lweight

svi

lbph

pgg45

age

lcp

gleason

0 5 10 15 20

LASSO path for prostate cancer data calculated using the LARS

arclength

standardized.coef

Page 15: Recent developments in direct labeled graphicsrug.mnhn.fr/semin-r/PDF/semin-R_directlabels_THocking...The protocol I use for everyday plots in practice: Do as many steps as needed

Increase text size to make reading easier

direct.label(lasso.plot,list(cex=2,"lasso.labels"))

-2

0

2

4

6

lcavol

lweight

svi

lbph

pgg45

age

lcp

gleason

0 5 10 15 20

LASSO path for prostate cancer data calculated using the LARS

arclength

standardized.coef

Page 16: Recent developments in direct labeled graphicsrug.mnhn.fr/semin-r/PDF/semin-R_directlabels_THocking...The protocol I use for everyday plots in practice: Do as many steps as needed

Motivation: confusing legends

How to add direct labels to some common plots

Recent developments in direct labeling

Conclusions

Page 17: Recent developments in direct labeled graphicsrug.mnhn.fr/semin-r/PDF/semin-R_directlabels_THocking...The protocol I use for everyday plots in practice: Do as many steps as needed

Changes in recent versions of directlabels

directlabels version < 2.0 < 2.0 ≥ 2.0 ≥ 2.0plotting package lattice ggplot2 lattice ggplot2

basic Positioning Methods X X X Xsmart Positioning Methodsthat avoid label collisions X X X

redraw labelsafter window resize X X

fontface and fontfamily

text parameters X X Xlabel black and white plots X X X

label aestheticsother than color X

Page 18: Recent developments in direct labeled graphicsrug.mnhn.fr/semin-r/PDF/semin-R_directlabels_THocking...The protocol I use for everyday plots in practice: Do as many steps as needed

Label a scatterplot of the iris data by species

direct.label(irisp)

jitter(Petal.Length)

jitter(Sepal.Length)

5

6

7

8

1 2 3 4 5 6 7

setosa

virginica

versicolor

Page 19: Recent developments in direct labeled graphicsrug.mnhn.fr/semin-r/PDF/semin-R_directlabels_THocking...The protocol I use for everyday plots in practice: Do as many steps as needed

Show the grid for the searchFind a label position on a grid (grey rectangles)that is near the center of each point cloud (green dots),but does not overlap any points or other labels (black dots).

direct.label(irisp,debug=TRUE)

jitter(Petal.Length)

jitter(Sepal.Length)

5

6

7

8

1 2 3 4 5 6 7

setosa

virginica

versicolor

Page 20: Recent developments in direct labeled graphicsrug.mnhn.fr/semin-r/PDF/semin-R_directlabels_THocking...The protocol I use for everyday plots in practice: Do as many steps as needed

Increase text size before calculating label positions

rp2 <- ratplot+

xlim(0,70)+ylim(150,650)

big.last <- list(cex=1.5,"last.qp")

direct.label(rp2,"big.last")

1 2 3

200

300

400

500

600

23418567

11910

12

13151416

0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70Time

weigh

t

Page 21: Recent developments in direct labeled graphicsrug.mnhn.fr/semin-r/PDF/semin-R_directlabels_THocking...The protocol I use for everyday plots in practice: Do as many steps as needed

Show the label borders used in the position calculation

direct.label(rp2,

list("big.last",

"calc.boxes",

"draw.rects"))

1 2 3

200

300

400

500

600

23418567

11910

12

13151416

0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70Time

weigh

t

Page 22: Recent developments in direct labeled graphicsrug.mnhn.fr/semin-r/PDF/semin-R_directlabels_THocking...The protocol I use for everyday plots in practice: Do as many steps as needed

Label positions for lineplots are the solutions of a QP

t1

t2

tk−1

tk

...

h1

h2

hk−1

hk

...

Assume that for each text label i = 1, . . . , k we haveits position ti and height hi .Then optimal direct labels do not overlap, and areas close as possible to the target locations:

minb∈Rk

k∑i=1

(bi − ti )2 = ||b − t||2

subject to bi+1 ≥ bi + hi+1/2 + hi/2, ∀ i = 1, . . . , k − 1

This is a quadratic program (QP). QPs are convex sothere is a unique global minimum which correspondsto the best labels.We can solve this using quadprog::solve.QP()

and use the optimal b for the direct label positions.

Page 23: Recent developments in direct labeled graphicsrug.mnhn.fr/semin-r/PDF/semin-R_directlabels_THocking...The protocol I use for everyday plots in practice: Do as many steps as needed

Start with boxes at the end of each line

direct.label(rp2,list("last.points",cex=1.5,

"calc.boxes",

"draw.rects"))

1 2 3

200

300

400

500

600

12345678

1011

12

9

13141516

0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70Time

weigh

t

Page 24: Recent developments in direct labeled graphicsrug.mnhn.fr/semin-r/PDF/semin-R_directlabels_THocking...The protocol I use for everyday plots in practice: Do as many steps as needed

Adjust box height if desired

direct.label(rp2,list("last.points",cex=1.5,"calc.boxes",

dl.trans(h=h+h/3),"calc.borders",

"draw.rects"))

1 2 3

200

300

400

500

600

12345678

1011

12

9

13141516

0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70Time

weigh

t

Page 25: Recent developments in direct labeled graphicsrug.mnhn.fr/semin-r/PDF/semin-R_directlabels_THocking...The protocol I use for everyday plots in practice: Do as many steps as needed

Apply QP solver to get optimal labels

direct.label(rp2,list("last.points",cex=1.5,"calc.boxes",

dl.trans(h=h+h/3),"calc.borders",

qp.labels("y","h")))

1 2 3

200

300

400

500

600

23418567

11910

12

13151416

0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70Time

weigh

t

Page 26: Recent developments in direct labeled graphicsrug.mnhn.fr/semin-r/PDF/semin-R_directlabels_THocking...The protocol I use for everyday plots in practice: Do as many steps as needed

Motivation: confusing legends

How to add direct labels to some common plots

Recent developments in direct labeling

Conclusions

Page 27: Recent developments in direct labeled graphicsrug.mnhn.fr/semin-r/PDF/semin-R_directlabels_THocking...The protocol I use for everyday plots in practice: Do as many steps as needed

What directlabels is NOT

I Automatic publication-quality direct labels.(some manual tweaking will always be necessary)

I Optimal labels for individual points on scatterplots.(it is a bit more complicated)

-20 -10 0 10 20

-20

-10

010

20

x

y

-5.02192

1.31531

-0.78917

8.86785

1.16971

3.1863

-5.81791

7.14533

-8.25259

-3.59862

0.898860.96274

-2.01634

7.3984

1.2338

-0.29317

-3.88854

5.10856

-9.13814 23.10297

-4.3809

7.640612.61961

7.73405

-8.14379

-4.38451

-7.20222

2.30945

-11.57729

2.47076

-0.91114

17.57376

-1.3793

-1.11193

-6.90014

-2.21794

1.82908

4.17323

10.65402

9.70202

-1.01629

14.03203

-17.76776

6.22867

-5.22283

13.22231

-3.6344

13.19066

0.43779-18.78656

Page 28: Recent developments in direct labeled graphicsrug.mnhn.fr/semin-r/PDF/semin-R_directlabels_THocking...The protocol I use for everyday plots in practice: Do as many steps as needed

Use directlabels instead of confusing legends!

I Works with lattice and ggplot2.

I Sensible defaults.

I Useful in everyday plots in practice.

I Smart Positioning Methods that avoid label collisions.

I Customizable: you can write your own Positioning Methods.

Page 29: Recent developments in direct labeled graphicsrug.mnhn.fr/semin-r/PDF/semin-R_directlabels_THocking...The protocol I use for everyday plots in practice: Do as many steps as needed

Future work

I Automatically adjust xlim/ylim so labels stay on plot region?

I Contourplot labels as in contour()?

I Label using images instead of textual factor names?Possible Google Summer of Code 2012 project:http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2012