48
Using R in Astronomy Getting started quickly with R Juan Pablo Torres- Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México [email protected] División de Ciencias Naturales y Exactas, Campus Guanajuato, Sede Valenciana

Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México [email protected]

Embed Size (px)

Citation preview

Page 1: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

Using R in Astronomy

Getting started quickly with R

Juan Pablo Torres-Papaqui

Departamento de AstronomíaUniversidad de Guanajuato

DA-UG, México

[email protected]ón de Ciencias Naturales y Exactas,

Campus Guanajuato, Sede Valenciana

Page 2: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

Why R?

R is a free language and environment for statistical computing and graphics. It is superior to other graphics/analysis packages commonly used in Astronomy, e.g. supermongo, pgplot, gnuplot, and features a very wide range of statistical analysis and data plotting functions.

Furthermore, R is already the most popular amongst the leading software for statistical analysis, as measured by a variety of indicators, and is rapidly growing in influence.

“R is really important to the point that it's hard to overvalue it. It allows statisticians to do very intricate and complicated analyses without knowing the blood and guts of computing systems.” Google research scientist, quoted in the New York Times.

Page 3: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

Key features

It's a mature, widely used (around 2-3 million users) and well-supported free and open source software project; it's committed to a twice-yearly update schedule (it runs on GNU/linux, unix, MacOS and Windows)Excellent graphics capabilitieVector arithmetic, plus many built-in basic & advanced statistical and numerical analysis toolsIntelligent handling of missing data values (denoted by “NA”)Highly extensible, with around 3000 user-contributed packages availableIt's easy to use and has excellent online help and associated documentation

Page 4: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

Download

The R Project for Statistical Computing

http://www.r-project.org/

Page 5: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

Getting started

The online help pages can be accessed as follows:

help.start() # Load HTML help pages into browserhelp(package) # List help page for "package"?package # Shortform for "help(package)"help.search("keyword") # Search help pages for "keyword"?help # For more optionshelp(package=base) # List tasks in package "base"q() # Quit R

Page 6: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

Getting started

Some extra packages are not loaded by default, but can be loaded as follows:

library("package")library() # List available packages to loadlibrary(help="package") # list package contents

Page 7: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

Getting started

Commands can be separated by a semicolon (";'') or a newline

All characters after "#'' are treated as comments (even on the same line as commands)

Variables are assigned using "<-'' (although "='' also works):

x <- 4.5y <- 2*x^2 # Use "<<-'' for global assignment

Page 8: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

Vectors

x <- c(1, 4, 5, 8) # Concatenate values & assign to xy <- 2*x^2 # Evaluate expression for each element in x> y # Typing the name of a variable evaluates it[1] 2 32 50 128

Page 9: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

If x is a vector, y will be a vector of the same length. The vector arithmetic in R means that an expression involving vectors of the same length (e.g. x+y, in the above example) will be evaluated piecewise for corresponding elements, without the need to loop explicitly over the values:

> x+y[1] 3 36 55 136

> 1:10 # Same as seq(1, 10) [1] 1 2 3 4 5 6 7 8 9 10

> a <- 1:10a[-1] # Print whole vector *except* the first elementa[-c(1, 5)] # Print whole vector except elements 1 & 5a[1:5] # Print elements 1 to 5a[length(a)] # Print last element, for any length of vectorhead(a, n=3) # Print the first n elements of atail(a, n=3) # Print the last n elements of a

Page 10: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

Demonstrations of certain topics are available: demo(graphics) # Run grapics demonstrationdemo() # List topics for which demos are available?demo # List help for "demo" function

Page 11: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

Typing the name of a function lists its contents; typing a variable evaluates it, or you can use print(x).

A great feature of R functions is that they offer a lot of control to the user, but provide sensible hidden defaults. A good example is the histogram function, which works very well without any explicit control:

First, generate a set of 10,000 Gaussian distributed random numbers:

data <- rnorm(1e4) # Gaussian distributed numbers with mean=0 & sigma=1hist(data) # Plots a histogram, with sensible bin choices by default

See ?hist for the full range of control parameters available, e.g. to specifiy 7 bins:

hist(data, breaks=7)

Page 12: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

Data Input/Output

Page 13: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

Assuming you have a file (“file.dat”) containing the following data, with either spaces or tabs between fields:

r x y1 4.2 14.22 2.4 64.83 8.76 63.44 5.9 32.25 3.4 89.6

setwd("C:/Users/juanchito/Documents/Astronomy School/") #only for windows users

Page 14: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

Now read this file into R:

> inp <- read.table("file.dat", header=T) # Read data into data frame> inp # Print contents of inp> inp[0] # Print the type of object that inp is:NULL data frame with 5 rowsor data frame with 0 columns and 5 rows

> colnames(inp) # Print the column names for inp:[1] "r" "x" "y" # This is simply a vector that can easily be changed:

> colnames(inp) <- c("radius", "temperature", "time")> colnames(inp)[1] "radius" "temperature" "time"

Page 15: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

Note that you can refer to R objects, names etc. using the first few letters of the full name, provided that is unambiguous, e.g.:

> inp$r # Print the Radius column

but note what happens if the information is ambiguous or if the column doesn't exist:

> inp$t # Could match "inp$temperature" or "inp$time"NULL> inp$wibble # "wibble" column doesn't existNULL

Alternatively, you can refer to the columns by number:

> inp[[1]] # Print data as a vector (use "[[" & "]]")[1] 1 2 3 4 5

> inp[1] # Print data as a data.frame (only use "[" and "]")

Page 16: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

Writing data out to a file (if no filename is specified (the default), the output is written to the console)

> write.table(inp, quote=F, row.names=F, col.names=T, sep="\t:\t")> write.table(inp, quote=F, row.names=F, col.names=T) radius temperature time1 4.2 14.22 2.4 64.83 8.76 63.44 5.9 32.25 3.4 89.6

Page 17: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

Saving data

?savesave(inp, t, file="data.RData")load(file="data.RData") # read back in the saved data:# - this will load the saved R objects into the current session, with# the same properties, i.e. all the variable will have the same names# and contents

Note that when you quit R (by typing q()), it asks if you want to save the workspace image, if you specify yes (y), it writes out two files to the current directory, called .RData and .Rhistory. The former contains the contents of the saved session (i.e. all the R objects in memory) and the latter is a list of all the command history (it's just a simple text file you can look at).

At any time you can save the history of commands using:

savehistory(file="my.Rhistory")loadhistory(file="my.Rhistory") # load history file into current R session

Page 18: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

Plotting data

Page 19: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

Useful functions:

?plot # Help page for plot command?par # Help page for graphics parameter control?Devices # or "?device"# (R can output in postscript, PDF, bitmap, PNG, JPEG and more formats)

dev.list() # list graphics devices????colours() # or "colors()" List all available colours?plotmath # Help page for writing maths expressions in R

Tip: to generate png, jpeg etc. files, I find it's best to create pdf versions in R, then convert them in gimp (having specified strong antialiasing, if asked, when loading the pdf file into gimp), otherwise the figures are "blocky" (i.e. suffer from aliasing).

Page 20: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

To create an output file copy of a plot for printing or including in a document etc.

dev.copy2pdf(file="myplot.pdf") # } Copy contents of current graphics devicedev.copy2eps(file="myplot.eps") # } to a PDF or Postscript file

Page 21: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

Adding more datasets to a plot:

x <- 1:10; y <- x^2z <- 0.9*x^2plot(x, y) # Plot original datapoints(x, z, pch="+") # Add new data, with different symbolslines(x,y) # Add a solid line for original datalines(x, z, col="red", lty=2) # Add a red dashed line for new datacurve(1.1*x^2, add=T, lty=3, col="blue") # Plot a function as a curvetext(2, 60, "An annotation") # Write text on plotabline(h=50) # Add a horizontal lineabline(v=3, col="orange") # Add a vertical line

Page 22: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

Error bars

source("errorbar.R") # source code# Create some data to plot:

x <- seq(1,10); y <- x^2xlo <- 0.9*x; xup <- 1.08*x; ylo <- 0.85*y; yup <- 1.2*y

Plot graph, but don't show the points (type="n") & plot errors:

plot(x, y, log="xy", type="n")errorbar(x, xlo, xup, y, ylo, yup)

?plot gives more info on plotting options (as does ?par). To use different line colours, styles, thicknesses & errorbar types for different points:

errorbar(x, xlo, xup, y, ylo, yup, col=rainbow(length(x)), lty=1:5, lwd=1:3)

Page 23: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

Plotting a function or equation

curve(x^2, from=1, to=100) # Plot a function over a given rangecurve(x^2, from=1, to=100, log="xy") # With log-log axes

Plot chi-squared probability vs. reduced chi-squared for 10 and then 100 degrees of freedom

dof <- 10curve(pchisq(x*dof,df=dof), from=0.1, to=3, xlab="Reduced chi-squared", ylab="Chi-sq distribution function")abline(v=1, lty=2, col="blue") # Plot dashed line for reduced chisq=1dof <- 100curve(pchisq(x*dof, df=dof), from=0.1, to=3, xlab="Reduced chi-squared", ylab="Chi-sq distribution function")abline(v=1, lty=2, col="blue") # Plot dashed line for reduced chisq=1

Page 24: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

Define & plot custom function:

myfun <- function(x) x^3 + log(x)*sin(x)

#--Plot dotted (lty=3) red line, with 3x normal line width (lwd=3):curve(myfun, from=1, to=10, lty=3, col="red", lwd=3)

#--Add a legend, inset slightly from top left of plot:legend("topleft", inset=0.05, "My function", lty=3, lwd=3, col="red")

Plotting maths symbols:

demo(plotmath) # Shows a demonstration of many examples?plotmath # Manual page with more information

Page 25: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

Analysis

Page 26: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

Basic stuff

x <- rnorm(100) # Create 100 Gaussian-distributed random numbershist(x) # Plot histogram of xsummary(x) # Some basic statistical infod <- density(x) # Compute kernel density estimated # Print info on density analysis ("?density" for details)plot(d) # Plot density curve for xplot(density(x)) # Plot density curve directly

Plot histogram in terms of probability & also overlay density plot:

hist(x, probability=T)lines(density(x), col="red") # overlay smoothed density curve

#--Standard stuff:mean(x); median(x); max(x); min(x); sum(x); weighted.mean(x)sd(x) # Standard deviationvar(x) # Variancemad(x) # median absolute deviation (robust sd)

Page 27: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

Testing the robustness of statisical estimators

?replicate # Repeatedly evaluate an expression

Calculate standard deviation of 100 Gaussian-distributed random numbers (NB default sd is 1; can specify with rnorm(100, sd=3) or whatever)

sd(rnorm(100))

Now, let's run N=5 Monte Carlo simulations of the above test:

replicate(5, sd(rnorm(100)))

and then compute the mean of the N=5 values:

mean(replicate(5, sd(rnorm(100))))

Page 28: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

Obviously 5 is rather small for a number of MC sims (repeat the above command & see how the answer fluctuates). Let's try 10,000 instead:

mean(replicate(1e4, sd(rnorm(100))))

With a sample size of 100, the measured sd closely matches the actual value used to create the random data (sd=1). However, see what happens when the sample size decreases:

mean(replicate(1e4, sd(rnorm(10)))) # almost a 3% bias lowmean(replicate(1e4, sd(rnorm(3)))) # >10% bias low

Page 29: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

You can see the bias more clearly with a histogram or density plot:

a <- replicate(1e4, sd(rnorm(3)))hist(a, breaks=100, probability=T) # Specify lots of binslines(density(a), col="red") # Overlay kernel density estimateabline(v=1, col="blue", lwd=3) # Mark true value with thick blue line

This bias is important to consider when esimating the velocity dispersion of a stellar or galaxy system with a small number of tracers. The bias is even worse for the robust sd estimator mad():

mean(replicate(1e4, mad(rnorm(10)))) # ~9% bias lowmean(replicate(1e4, mad(rnorm(3)))) # >30% bias low

Page 30: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

However, in the presence of outliers, mad() is robust compared to sd() as seen with the following demonstration. First, create a function to add a 5 sigma outlier to a Gaussian random distribution:

outlier <- function(N,mean=0,sd=1) { a <- rnorm(N, mean=mean, sd=sd) a[1] <- mean+5*sd # Replace 1st value with 5 sigma outlier return(a)}

Now compare the performance of mad() vs. sd():

mean(replicate(1e4, mad(outlier(10))))mean(replicate(1e4, sd(outlier(10))))

Page 31: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

You can also compare the median vs. Mean:

mean(replicate(1e4, median(outlier(10))))mean(replicate(1e4, mean(outlier(10))))

...and without the outlier:

mean(replicate(1e4, median(rnorm(10))))mean(replicate(1e4, mean(rnorm(10))))

Page 32: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

Kolmogorov-Smirnov test, taken from manual page (?ks.test)

x <- rnorm(50) # 50 Gaussian random numbersy <- runif(30) # 30 uniform random numbers

# Do x and y come from the same distribution?ks.test(x, y)

Page 33: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

Correlation tests

Load in some data:

dfile <- "table1.txt"A <- read.table(dfile, header=T, sep="|")cor.test(A$z, A$Tx) # Specify X & Y data separatelycor.test(~ z + Tx, data=A) # Alternative format when using a data frame ("A")cor.test(A$z, A$Tx, method="spearman") # } Alternative methodscor.test(A$z, A$Tx, method="kendall") # }

plot(A$z,A$Tx)

Page 34: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

Spline interpolation of data

x <- 1:20y <- jitter(x^2, factor=20) # Add some noise to vectorsf <- splinefun(x, y) # Perform cubic spline interpolation

# - note that "sf" is a function:sf(1.5) # Evaluate spline function at x=1.5plot(x,y); curve(sf(x), add=T) # Plot data points & spline curve

Page 35: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

Scatter plot smoothing

#--Using above data frame ("A"):plot(Tx ~ z, data=A)

#--Return (X & Y values of locally-weighted polynomial regressionlowess(A$z, A$Tx)

#--Plot smoothed data as line on graph:lines(lowess(A$z, A$Tx), col="red")

Page 36: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

Regression

dfile <- "http://www.astro.ugto.mx/~papaqui/download/winter/table1.txt" A <- read.table(dfile, header=T, sep="|")names(A) # Print column names in data framem <- lm(Tx ~ z, data=A) # Fit linear modelsummary(m) # Show best-fit parameters & errors

Page 37: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

Plot data & best-fit model:

plot(Tx ~ z, A)abline(m, col="blue") # add best-fit straight line model

Plot residuals:

plot(A$z, resid(m))abline(h=0, col="red")

Page 38: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

Functions

Page 39: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

Basics

cat # Type function name without brackets to list contentsargs(cat) # Return arguments of any functionbody(cat) # Return main body of function

Create your own function

> fun <- function(x, a, b, c) (a*x^2) + (b*x^2) + c> fun(3, 1, 2, 3)[1] 30

> fun(5, 1, 2, 3)[1] 78

Page 40: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

A more complicated example of a function. First, create some data:

set.seed(123) # allow reproducible random numbersa <- rnorm(1000, mean=10, sd=1) # 1000 Gaussian random numbersb <- rnorm(100, mean=50, sd=15) # smaller population of higher numbersx <- c(a, b) # Combine datasetshist(x) # Shows outlier population clearlysd(x) # Strongly biased by outliersmad(x) # Robustly estimates sd of main samplemean(x) # biasedmedian(x) # robust

Page 41: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

Now create a simple sigma clipping function (you can paste the following straight into R):

sigma.clip <- function(vec, nclip=3, N.max=5) { mean <- mean(vec); sigma <- sd(vec) clip.lo <- mean - (nclip*sigma) clip.up <- mean + (nclip*sigma) vec <- vec[vec < clip.up & vec > clip.lo] # Remove outliers if ( N.max > 0 ) { N.max <- N.max - 1 # Note the use of recursion here (i.e. the function calls itself): vec <- Recall(vec, nclip=nclip, N.max=N.max) } return(vec)}

Now apply the function to the test dataset:

new <- sigma.clip(x)hist(new) # Only the main population remains!length(new) # } Compare numbers oflength(x) # } values before & after clipping

Page 42: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

Factors

Page 43: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

Factors are a vector type which treats the values as character strings which form part of a base set of values (called "levels"). The result of plotting a factor is to produce a frequency histogram of the number of entries in each level (see ?factor for more info).

Basics

a <- rpois(100, 3) # Create 100 Poisson distributed random numbers (lambda=3)hist(a) # Plot frequency histogram of dataa[0] # List type of vector ("numeric")b <- as.factor(a) # Change type to factor

List type of vector b ("factor"): also lists the "levels", i.e. all the different types of value:

b[0]

Now, plot b, to get a barchart showing the frequency of entries in each level:

plot(b)table(b) # a tabular summary of the same information

Page 44: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

A more practical example. Read in table of data (from Sanderson et al. 2006), which refers to a sample of 20 clusters of galaxies with Chandra X-ray data:

file <- "http://www.astro.ugto.mx/~papaqui/download/winter/table1.txt"A <- read.table(file, header=T, sep="|")

By default, “read.table()” will treat non-numeric values as factors. Sometimes this is annoying, in which case use “as.is=T”; you can specify the type of all the input columns explicitly -- see “?read.table”.

A[1,] # Print 1st row of data frame (will also show column names):plot(A$det) # Plot the numbers of each detector typeplot(A$cctype) # Plot the numbers of cool-core and non-cool core clusters

The “det” column is the detector used for the observation of each cluster: S for ACIS-S and I for ACIS-I. The “cctype” denotes the cool core type: “CC” for cool core, “NCC” for non-cool core.

Page 45: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

Now, plot one factor against another, to get a plot showing the number of overlaps between the two categories (the cool-core clusters are almost all observed with ACIS-S):

plot(cctype ~ det, data=A, xlab="ACIS Detector", ylab="Cool-core type")

You can easily explore the properties of the different categories, by plotting a factor against a numeric variable. The result is to show a boxplot (see “?boxplot”) of the numeric variable for each catagory. For example, compare the mean temperature of cool-core & non-cool core clusters:

plot(Tx ~ cctype, data=A)

and compare the redshifts of clusters observed with ACIS-S & ACIS-I:

plot(z ~ det, data=A)

You can define your own factor categories by partitioning a numeric vector of data into discrete bins using “cut()”, as shown here:

Page 46: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

Converting a numeric vector into factors

Split cluster redshift range into 2 bins of equal width (nearer & further clusters), and assign each z point accordingly; create new column of data frame with the bin assignments (A$z.bin):

A$z.bin <- cut(A$z, 2)A$z.bin # Show levels & bin assignmentsplot(A$z.bin) # Show bar chart

Plot mean temperatures of nearer & further clusters. Since this is essentially a flux-limited sample of galaxy clusters, the more distant clusters (higher redshift) are hotter (and hence more X-ray luminous - an example of the common selection effect known as Malmquist bias):

A$Tx.bin <- cut(A$Tx, 2)plot(z ~ Tx.bin, data=A)plot(Tx ~ z, data=A) # More clearly seen!

Page 47: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

Check if redshifts of (non) cool-core clusters differ:

plot(cctype ~ z.bin, data=A) # Equally mixed

Now, let's say you wanted to colour-code the different CC types. First, create a vector of colours by copying the cctype column (you can add this as a column to the data frame, or keep it as a separate vector):

colour <- A$cctype

Now all you need to do is change the names of the levels:

levels(colour) # Show existing levelslevels(colour) <- c("blue", "red") # specify blue=CC, red=NCCcolour[0] # Show data type (& levels, since it's is a factor)

Page 48: Using R in Astronomy Getting started quickly with R Juan Pablo Torres-Papaqui Departamento de Astronomía Universidad de Guanajuato DA-UG, México papaqui@astro.ugto.mx

Now change the type of the vector to character (i.e. not a factor):

col <- as.character(colour)col[0] # Confirm change of data typecol # } Print values andcolour # } note the difference

Now plot mean temperature vs. redshift, colour coded by CC type:

plot(Tx ~ z, A, col=col)

Why not add a plot legend:

legend(x="topleft", legend=c(levels(A$cctype)), text.col=levels(colour))

Now for something a little bit fancier:

plot(Tx ~ z, A, col=col, xlab="Redshift", ylab="Temperature (keV)")legend(x="topleft", c(levels(A$cctype)), col=levels(colour),text.col=levels(colour), inset=0.02, pch=1)