16
Plotting 1.0 Selene Fernandez-Valverde Lab Meeting 26-08-09

R graphics260809

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: R graphics260809

Plotting 1.0Selene Fernandez-Valverde

Lab Meeting 26-08-09

Page 2: R graphics260809

Your scientific graphing options

Others...?

Page 3: R graphics260809

Why not only Excel ?

Excel is relatively limited in its support of scientific graphing

It’s options provide limited control over the output

Limited selection of graph types

Limited number of datapoint that can be plotted (or it dies)

Page 4: R graphics260809

What plots you can do with R?

Type on your R terminal:

> demo(graphics)

> demo(persp) > demo(lattice)

Now, that something you can’t make in Excel or Prism. ( you actually can in Matlab )

Page 5: R graphics260809

Cool, but...

Steep learning curve

Plotting is step by step

Prettifying a graph takes a bit lot of effort

I don’t want to script in R I just want to plot my results

Page 6: R graphics260809

How do we avoid that?We use a package made by someone who encountered these problems before“ggplot2 is a plotting system for R, based on the grammar

of graphics, which tries to take the good parts of base and

lattice graphics and (almost) none of the bad parts. It takes

care of many of the fiddly details that make plotting a

hassle (like drawing legends) as well as providing a

powerful model of graphics that makes it easy to produce

complex multi-layered graphics.”

In summary: R graphs made easy

Page 7: R graphics260809

How do I start?First format the data into a table that looks like this:

Wait ! This looks like an Excel table! Well... it is ( it can also be a tab delimited file )

Make sure your variables (columns) are meaningful and allow you to retrieve the information that you want to plot

carat cut color clarity depth table price x y z

0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43

0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31

0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31

0.29 Premium I VS2 62.4 58 334 4.2 4.23 2.63

0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75

0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48

Page 8: R graphics260809

Read the table into RSet your working directory:

> setwd (“./Documents/MyUsername/FolderWhereMyExcelFileIs/”)> getwd()> install.packages("ggplot2", dependencies=TRUE)> library(ggplot2)

If your file is and Excel file: > install.packages("gdata")> library(gdata)> table <- read.xls(“MyExcelFile.xls”)

If your file is a tab delimited file:> table <- read.delim(“MyExcelFile.txt”)> summary(table)

We already have a loaded dataset named “diamonds”> summary(diamonds)

Start plotting!

Page 9: R graphics260809

Your first plot (s)> ggplot(diamonds, aes(color)) + geom_bar() > ggplot(diamonds, aes(color, fill=cut)) + geom_bar()> ggplot(diamonds, aes(color, fill=cut)) + geom_bar(position="dodge")> ggplot(diamonds, aes(color, fill=cut)) + geom_bar(position="dodge") + scale_y_log10()> ggplot(diamonds, aes(color, fill=cut)) + geom_bar(position="fill")> ggplot(diamonds, aes(color, depth)) + geom_point()> ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly()> ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly() + xlab("Diamond Color")> ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly() + xlab("Diamond Color") + coord_flip()> ggplot(diamonds, aes(clarity, fill=color)) + geom_bar() + facet_wrap(~ cut) > ggplot(diamonds, aes(clarity, fill=color)) + geom_bar() + facet_grid(. ~ cut)> ggplot(diamonds, aes(color, depth, color=cut)) + geom_point()> ggplot(diamonds, aes(color, depth, color=cut)) + geom_jitter()> ggplot(diamonds, aes(color, depth, color=cut)) + geom_jitter() + ylim(53,70)> ggplot(diamonds, aes(color, depth, color=cut)) + geom_boxplot()

Page 10: R graphics260809

Making the graph prettier> ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly() + xlab("Diamond Color")> ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly() + theme_bw() + labs(x="Diamond color") > ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly() + theme_bw() + labs(x="Diamond color") + scale_y_continuous("Counts") > ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly() + theme_bw() + labs(x="Diamond color") + scale_y_continuous("Counts") + scale_color_hue("Cut") > ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly() + theme_bw() + labs(x="Diamond color") + scale_y_continuous("Counts") + scale_color_brewer("Cut")> ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly() + theme_bw() + labs(x="Diamond color") + scale_y_continuous("Counts") + scale_color_brewer("Cut", palette="Set1")ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly() + theme_bw() + labs(x="Diamond color") + scale_y_continuous("Counts", formatter="comma") + scale_color_brewer("Cut", palette="Set1")> ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly() + theme_bw() + labs(x="Diamond color") + scale_y_continuous("Counts") + scale_color_brewer("Cut", palette="Set1") + facet_wrap(~ clarity)> ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly() + theme_bw() + labs(x="Diamond color") + scale_y_continuous("Counts") + scale_color_brewer("Cut", palette="Set1") + facet_wrap(~ clarity, scale="free_y")

Page 11: R graphics260809

Oneliner graph example

ggplot(NHANES, aes(TIBC, Hemoglobin)) + geom_hex() + facet_grid(~Sex) + opts(aspect.ratio = 0.8)

Page 12: R graphics260809

Our version

ggplot(NHANES, aes(TIBC, Hemoglobin)) + geom_hex() + facet_grid(~Sex) + opts(aspect.ratio = 0.8) + theme_bw() + scale_fill_gradient("Number of

Patients")

TIBC

Hemoglobin

5

10

15

20

F

200 300 400 500 600 700

M

200 300 400 500 600 700

Number of

� � � � � � � � � �

20

40

60

80

100

Page 13: R graphics260809

Some things ggplot2 can’t do

You can’t click on your graph to change the labels, you have to rerun the program ( work around that , edit them in illustrator ) or use opts

When stacking data and setting a new limit, you’ll lose all the data that is in the group over that range

Page 14: R graphics260809

Last thoughtsCan handle millions of datapoints

It’s free

Is good for having a quick look at your data and changing the display in an easy manner

Works in all platforms ( Windows, Mac and Linux [server] like it )

It’s pretty ( and did I mention is fast? )

I think it saves you a bit of Illustrator time

If you already have your scheme and it works for you is not worth it, but if you are looking for a new plotting strategy I think is a good place to start

If you get into it you can start making statistical analysis of your data and plotting it all together

Page 15: R graphics260809

For more info

http://had.co.nz/ggplot2/

http://learnr.wordpress.com/

Page 16: R graphics260809

NHANES Data : National Health and Nutrition Examination Survey

Description

This is a somewhat large interesting dataset, a data frame of 15 variables (columns) on 9575 persons (rows).This data frame contains the following columns:

Cancer.Incidencebinary factor with levels No and Yes.Cancer.Deathbinary factor with levels No and Yes.Agenumeric vector giving age of the person in years.Smokea factor with levels Current, Past, Nonsmoker, and Unknown.Ednumeric vector of {0,1} codes giving the education level.Racenumeric vector of {0,1} codes giving the person's race.Weightnumeric vector giving the weight in kilogramsBMInumeric vector giving Body Mass Index, i.e., Weight/Height^2 where Height is in meters, and missings (61% !) are coded as 0 originally.Diet.Ironnumeric giving Dietary iron.Albuminnumeric giving albumin level in g/l.Serum.Ironnumeric giving Serum iron in ug/l.TIBCnumeric giving Total Iron Binding Capacity in ug/l.Transferinnumeric giving Transferin Saturation which is just 100*serum.iron/TIBC.Hemoglobinnumeric giving Hemoglobin level.Sexa factor with levels F (female) and M (male).