R for Wildlife Ecologists (Quick Reference Guide) · 1 Course Introduction 3 2 Starting in R 5 ... ## 1999 S-PLUS User Conference, New Orleans (October 1999) Additionally, for todays's

R for Wildlife Ecologists (Quick Reference Guide)

Bret Collier∗

Institute of Renewable Natural Resources, Texas A&M University, College Station, Texas 77845;[email protected]; 979/595/50706

Contents

1 Course Introduction 3

2 Starting in R 52.1 R Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Simple Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3 R Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.4 Classes and Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 R Project and Data Management 113.1 Working directories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.2 Importing and exporting data . . . . . . . . . . . . . . . . . . . . . . . . . . 123.3 Creation of, types, and working with data: a super short primer . . . . . . . 133.4 Basic Mathematical/Operators . . . . . . . . . . . . . . . . . . . . . . . . . 30

4 R Creating Graphics 364.1 Scatterplots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.2 Other Simple plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5 Statistical Models with R 495.1 Contingency Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495.2 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535.3 Generalized Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

6 Writing Functions in R 636.1 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

7 Wildlife-Speci�c Methods 667.1 Capture-Recapture Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 667.2 Distance Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 907.3 Spatial Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

∗Contact After 1 March: School of Renewable Natural Resources, Louisiana State University, [email protected] or979/595/5076

1

TX TWS Workshop, Feb 2014

8 Literature Too Look At! 1068.1 Here is a pretty short list of good books to get on your shelves. . . . . . . . . 1068.2 R packages that I use regularly and a few websites that will make your life

easier, . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

2


1 Course Introduction

First, since you are reading this you have taking the �rst steps towards freeing yourselvesfrom the forced servitude of point and click statistical programs that control your data'sstructure and drive your data analyses. No longer will you be told what numbers will beavailable for you to interpret or what statistical tests you should use. Rather, we are goingto move, together today, into computing on the data and questions of interest where youdecide the how to develop, manipulate, examine, and interpret statistical results.

Our philosophy for today is simple, using R to become better analysts, as described byDr. Harrell,

library(fortunes)

fortune("good data analyst")

##

## Can one be a good data analyst without being a half-good programmer? The

## short answer to that is, 'No.' The long answer to that is, 'No.'

## -- Frank Harrell

## 1999 S-PLUS User Conference, New Orleans (October 1999)

Additionally, for todays's work, we are going with the motto that �GUIs normally makeit simple to accomplish simple actions and impossible to accomplish complex actions.� (Iread this somewhere but cannot remember who said it, if you do let me know). Thus, todayis going to be all about programming, get ready!

Since everyone is a scientist here, you have probably realized that you are going to needat least a basic understanding of R. But, that's ok, because understanding R will bene�t youlong-term. The good thing(s) about R, to list a few, include:

1. There is a wealth of online documentation related to the use of R. Just look at the Rhomepage (http:/www.r-project.org) for a host of useful links.

2. There are huge numbers of freely available R packages that can be used to performspeci�c analyses and you can develop packages that archive your data and code sothat other folks can see/use it just as easy (http://cran.us.r-project.org/web/packages/).

3. Because R is a �exible environment, there are entire �elds of study (e.g., Analysis of Spa-tial Data) on which there are a wide range of approaches developed to conduct variousanalyses (some of which can be seen at http://cran.r-project.org/web/views/).Additionally, Springer has an entire series called Use-R (http://www.springer.com/statistics) consisting of books published on various R-related statistical topics.

4. R is not just for analysis, but merges seemlessly into the writing of theses & dissertations,books, articles, presentations, course notes, etc. Our course notes for today were writtenentirely in LYX (pronounced 'Licks') using the R package knitr (http://yihui.name/knitr/) to 'knit� the R code and the text (ported via TEX/LATEX, which is pronounced'Tech' and 'La Tech'-note that all pronunciation is open to interpretation dependingon whether you are a American English or English English speaker it seems) and isentirely reproducible on your computers. This integration of R into dynamic documentpresentations is the foundation of literate programming and is well grounded in theprocess of reproducible research (http://www.bepress.com/bioconductor/paper2/).

3


5. R is libre (open source) and gratis (freeware) software (http://www.gnu.org/licenses/gpl.html)-think: freedom of speech (libre) and free as in beer (gratis).

Now, the R downside that I want to put right out front for you.

1. R is a programming environment. If you are not used to developing programming code,or just don't have any experience programming, then the learning curve will be steepinitially (but we will solve some of that today).

2. Because R is a programming environment, it will not 'do' things for you that you areused to having done for you by various programs. If you want R to do something withyour data, you have to tell it to and you have to know what the outcome should looklike so you can ensure what you told R to give you and what R gave you are the samething.

Throughout this document, I am using notes that I have pulled together, presented elsewhereto other audiences, borrowed from friends, etc. So, I am probably not giving enough creditwhere it is due (I am admitting to plagarism right here) but since these are notes and I wantthem to be comprehensive as possible while focused on the issues I think you all need toknow, too bad.

The following is repeated from the quick start guide I sent out earlier, but I wanted it inhere as well just for consistency: Your �rst stop(s) (preferably before we meet) are listedbelow, these are the main ones, but there are tons of other sites you can frequent if yourinterested and do a little searching:

R Project website: http://www.r-project.org

R FAQ: http://cran.r-project.org/faqs.html (general/OS speci�c FAQs on here)

R Manuals: http://cran.r-project.org/manuals.html

CRAN (Comprehensive R Archive Network): http://cran.r-project.org/

R Search: http://www.r-project.org/search.html

Texas A&M University that has worked up some R video's that are interesting: http://dist.stat.tamu.edu/pub/rvideos/.

4


2 Starting in R

2.1 R Basics

First and foremost, these are not notes on how to do your particular kind of statistics. Iwon't be teaching statistics, I will be talking more about programming on the R language.Thus, you will see more text when I am talking about how R works, and less when I amapplying R to a speci�c instance (e.g., to linear regression). With that in mind, we will notcover every bit of code/text in this document during the workshop, rather, I wanted thisreference to be useful to you in the future, but I will highlight the immediately relevant partsas we work through the code examples for today.

Working with a language like R requires 2 basic things: time and interaction, thus, youhave to practice. So, your �rst goal will be to stop using whatever program you have beenusing for data management, manipulation, and analysis. For some of you, this is excel,which is bad, because excel sucks, makes bad graphs, and gives wrong answers, and well, itsucks and you should not use it for anything other than simplifying to .csv data, cause itsucks�EXCEL SUCKS DONT USE IT!

fortune("microsoft excel")

##

## Friends don't let friends use Excel for statistics!

## -- Jonathan D. Cryer (about problems with using Microsoft Excel for

## statistics)

## JSM 2001, Atlanta (August 2001)

I should note, that as I started to work up these course notes, I am probably startingat a level that is very basic to many of you. The reason for this is multi-fold; 1) I don'tknow what level of experience you all have, so I �gure it is best to begin at the beginning,2) thorough understanding of the basics is important, as you will waste considerably moretime with data formatting and getting it into R early in your career than you will actuallyrunning any analyses, and 3) I want the basics to be included so someone could run througheverything without help. However, these notes are by no stretch of the imagination anythingnear comprehensive.

2.2 Simple Programming

Lets start with the most basic, R as a really nice calculator. For instance, if I need to add, Ican add.

2 + 2

## [1] 4

Amazing, right? I can, if I want, create a sequence of numbers going one direction or theother.

1:10

## [1] 1 2 3 4 5 6 7 8 9 10

10:1

## [1] 10 9 8 7 6 5 4 3 2 1

5


Yep, that's cool. I can create a plot (more on this later).

hist(rnorm(100), xlab = "", las = 1, main = "Yay, a plot!")

Yay, a plot!F

requ

ency

−3 −2 −1 0 1 2 3 4

05

101520253035

Wow, addition, number sequences, plots, all in little code snippets that are completelyreproducible. What is this magic elixir you are showing us...well, I can write a function thattells you...

my.function = function(x) {

ifelse(x > 1, "Bud Light is not good beer", "R is like good beer")

}

my.function(1)

## [1] "R is like good beer"

my.function(2)

## [1] "Bud Light is not good beer"

2.3 R Objects

Everything in R is a object, and each object has a set of attributes associated with it thatdescribe the objects contents and how it can and should be used. Frequently, when workingin R, several calculations may be dependent upon each other. Thus, you will want to savethose results for future use by assigning those results to a object. In R, the usual assignmentoperator is <- (e.g., x<-1, so 1 is assigned to x, or 'x gets 1'). You are probably wonderingwhy you cannot use a = sign (e.g., x=1, so 1 is equal to x). Based on my work, they arefairly interchangable, although I have run into some situations where <- was required whenI was doing some simulations that required a for() loop. Most folks use <-, and unless youare writing functions (where you have use argument=value) or the boolean equality operator(==) you can use either, but I prefer to just use an = sign.

So, using the assignment operator, you can assign objects any name you choose. Objectnames can be upper- or lower-case letters, numbers, underscore (_), or periods(.). Good

6


programming practices are to assign objects to begin with a letter, not a number or period.Also, realize that R is case sensitive (I have made this mistake many times), for instance asI show in the below, I assigned a lower case x to be 2+2, a upper case X to be 3+3, andthen I printed both and show that R will tell you whether or not x and X are equal or not(note this is one of those places where the = sign would cause problems as if I had not usethe negation character (!), x would get X, or be equal to 3+3.

x = 2 + 2

X = 3 + 3

x

## [1] 4

X

## [1] 6

x != X

## [1] TRUE

Important to note here, there are certain letters, words, etc. that are used in R that youshould not use. For instance, never, ever, ever, EVER call your datasets data, quoting B.Ripley from a R fortune �You would not call your dog, dog would you.� data() is actuallyat R function for loading datasets, so changing what it means is kind of a problem. Anotherexample is c(), which is a function that concantenates data together into a vector and allowsyou to give the vector a name and operate on it as shown below, thus changing what its doesis probably not a good idea.

x <- c(2, 4, 6, 8)

x

## [1] 2 4 6 8

x * 2

## [1] 4 8 12 16

Finding out what you can or cannot label things is somethings a trial and error process,but, when in doubt, use a underscore or a period in your object names as that reduces thechance of mis-naming something. But, when in doubt, you can always type in the name intoR and see if it is already used, such as,

c

## function (..., recursive = FALSE) .Primitive("c")

2.4 Classes and Modes

This part will be kind of painful, but its important so read it once and then move on. Sincewe now know how R uses the assignment operator to specify an object, we need to considerthat each R object has attributes associated with it. Attributes describe the contents of the R

7


object and how the object can and should be used. Probably the most important attributesof an R object is the class and mode of the object. There are several functions to evaluatethe structure of your data, mainly 'mode', 'class', and 'str' (which, if you see below, will tellyou the mode and the value(s) of the object, which is very useful when dealing with dataframes or lists).

x <- 2 + 2

x

## [1] 4

mode(x)

## [1] "numeric"

class(x)

## [1] "numeric"

str(x)

## num 4

Thus, you can see that a number gets a mode 'numeric' and a class 'numeric'. Additionally,there are several other modes, mainly atomic, complex, and raw; none of which you shouldexpect to see with any frequency in your work unless you really get into the programmingend of things. There is also a logical model, which has values TRUE, and FALSE (never Tor F) and character mode, which is a character (speci�ed with quotation marks).

mode(TRUE)

## [1] "logical"

class(TRUE)

## [1] "logical"

mode("Lyla")

## [1] "character"

class("Lyla")

## [1] "character"

Finally, you can verify whether an object has a particular mode or is a member of aparticular class using one of several R function to test if an object is of a speci�c type whereyou can see the x is in fact numeric (TRUE) and not factor (FALSE). Some of these predicatefunctions are: is.numeric, is.factor, is.list, and so on.

x <- 2

mode(x)

## [1] "numeric"

8


is.numeric(x)

## [1] TRUE

is.factor(x)

## [1] FALSE

mode(as.integer(2))

## [1] "numeric"

class(as.integer(2))

## [1] "integer"

But, you can also have mixed vectors of numeric and character, which R will convert allto character values but you can change character values back to numeric using as.numeric.

test2 <- c("A", 2, "C")

test2

## [1] "A" "2" "C"

class(test2)

## [1] "character"

as.numeric(c("1", "2", "3"))

## [1] 1 2 3

Finally, in addition to having classes and modes, vectors have length attributes, whichyou can get using the length function.

length(c("A", 2, "C"))

## [1] 3

Now, before we �nish, we need to real quick touch on factors and how they are storedin R as factor objects are of numeric mode, but with a class attribute such that characterattributes are displayed even though the storage mode is numeric. For example, stealingsome notes from my friend Je�, see the below. What is actually being stored in my.factor isa numeric vector c(2, 1, 3) because the levels are alphabetical, hence �B� is the second so bydefault it gets a 2, �A� is �rst so it gets a 1, and so on.

my.factor <- c("B", "A", "C")

my.factor <- factor(my.factor)

mode(my.factor)

## [1] "numeric"

class(my.factor)

## [1] "factor"

9


my.factor

## [1] B A C

## Levels: A B C

10


3 R Project and Data Management

Here I want to talk a bit about managing R projects, importing and managing data, andmanipulation of datasets in R. So, �rst of all, this will be a ridiculously short primer on thetopic, as there are entire books written on data manipulation in R (e.g., Spector's �DataManipulation with R book) and there are quite a few really great packages for working withvarious types of data (if you have not ever heard of Hadley, then see his R packages for datamanipulation at http://had.co.nz/). I will highlight a few of the base R methods, andpoint out a a few functions I �nd extremely useful, and then your o� on your own to go forthand prosper.

3.1 Working directories

So, now you have R installed and started on your computer. One thing that some folks �ndhandy is to set a working directory, or a place where a particular project will be house.You don't have to use a working directory, but it can be helpful to set a working directoryfor projects in R that involve more than 30 seconds of thought. In simple terms, a workingdirectory is exactly that, a directory where all the work on a particular project will beconducted, where you R session information will be saved, where R will look for any �lesor source data functions you want to use when you are working, and where any outputyou create and write from R will go. There are several ways to set a working directory, forexample, in Windows you could open R and go to File �>Change dir and set the workingdirectory to any location (e.g., for instance, you can create a folder called RCourse and putin the Documents section of your computer). However, when I used working directories Itend set the working directory speci�c to each analysis project that I conduct using setwd()using the PATH format for each of the 3 standard operating systems (these are based o� ofmy various work machines I use for R package builds and computer programming stu�, yourpaths will be di�erent). Note that the slashes are forward (/) not back (\) slashesin the PATH name:

Linux: setwd(�/home/bret/BretResearch/Workshops/TxTWS_RWorkshop/�)

Windows: setwd(�C:/Users/bret.collier/Documents/Workshops/TxTWS_RWorkshop/�)

Mac: setwd(�/Users/bretcollier/BretResearch/Workshops/TxTWS_RWorkshop/�)

There are several nice things about working directories, but the main one is that after youset a working directory, then when you need to load data into your workspace, or save dataor graphs, then having a working directory set saves lots of time. For example you couldwrite a code snippet where you de�ne where you want R to go look for the data you areinterested in analyzing:

example.data <- read.csv("F:/Rio209.csv", header = TRUE)

head(example.data)

## ID Lat Lon Date

## 1 1 32.76 -98.72 2011-03-29 08:57:32

## 2 2 32.76 -98.72 2011-03-29 12:57:32

## 3 3 32.76 -98.72 2011-03-29 16:57:21

## 4 4 32.76 -98.72 2011-03-29 20:57:34

11


## 5 5 32.76 -98.72 2011-03-30 00:57:32

## 6 6 32.76 -98.72 2011-03-30 04:57:32

Which shows R where the data�le you want to load is, tells R to go out and load it, andthen gives that data�le the R object name 'data'. If you type something wrong (which youwill), you will get this:

bad.data <- read.csv("F:/RRio209.csv", header = TRUE)

## Warning: cannot open file 'F:/RRio209.csv': No such file or directory

## Error: cannot open the connection

head(bad.data)

## Error: object 'bad.data' not found

However, if you are importing multiple datasets, or planning on exporting multipledatasets or graphics, the perhaps a better option is below where you set a working di-rectory �rst, then R knows where to go to look for your data, and where to put anything yououtput.

setwd("F:/")

same.data <- read.csv("Rio209.csv", header = TRUE)

head(same.data)

## ID Lat Lon Date

## 1 1 32.76 -98.72 2011-03-29 08:57:32

## 2 2 32.76 -98.72 2011-03-29 12:57:32

## 3 3 32.76 -98.72 2011-03-29 16:57:21

## 4 4 32.76 -98.72 2011-03-29 20:57:34

## 5 5 32.76 -98.72 2011-03-30 00:57:32

## 6 6 32.76 -98.72 2011-03-30 04:57:32

Working directories have some downfalls, in that if you are sourcing in from variousworkspaces, or if all your R work is housed in a single workspace to simplify project manage-ment and package development (like mine is, ask if you want to see my setup), thenusing a setwd() can be a pain. And yes, I know I just showed you how to load data andthat was supposed to come later, don't freak out, it was just an example.

3.2 Importing and exporting data

Probably the simplest method for loading a small (or large) dataset when all the data isof the same mode is to use the brilliantly named set of read functions read.foo functionswhere 'foo' is a name such as .txt, .csv, etc.. So, just as an example, using the Rio209.csv

�le above, you can read it into your R session in a variety of ways. First, you can just readit straight in:

example.data <- read.csv("F:/Rio209.csv", header = TRUE)

head(example.data)

## ID Lat Lon Date

## 1 1 32.76 -98.72 2011-03-29 08:57:32

12


## 2 2 32.76 -98.72 2011-03-29 12:57:32

## 3 3 32.76 -98.72 2011-03-29 16:57:21

## 4 4 32.76 -98.72 2011-03-29 20:57:34

## 5 5 32.76 -98.72 2011-03-30 00:57:32

## 6 6 32.76 -98.72 2011-03-30 04:57:32

You can identify a working directory and read it in from there:

setwd("F:/")

same.data <- read.csv("Rio209.csv", header = TRUE)

head(same.data)

## ID Lat Lon Date

## 1 1 32.76 -98.72 2011-03-29 08:57:32

## 2 2 32.76 -98.72 2011-03-29 12:57:32

## 3 3 32.76 -98.72 2011-03-29 16:57:21

## 4 4 32.76 -98.72 2011-03-29 20:57:34

## 5 5 32.76 -98.72 2011-03-30 00:57:32

## 6 6 32.76 -98.72 2011-03-30 04:57:32

Ok, your probably thinking: how in the heck do I know what the function is to read datain? Well, R has a nice little object identi�er for the question mark when, if typed into theconsole in front of a function name, will open the help �les for the R function of interes.For instance, using ?read.table will open the help �les for the read.table() function,and all the other read.foo() functions available in base R. For those of you who work withdatabases on a regular basis, then there is a R package RODBC that is extremely useful foropening connections with various ODBC database structures and importing tables of data,either as is or using SQL language queries to specify exactly what is needed.

3.3 Creation of, types, and working with data: a super short primer

Vectors

We all know you can really not do too much fancy mathematics on a scalar (vector with 1value) so, we need to look into the wide variety other methods for working with data in R.Now, we are going to start stepping into the creation and manipulation of several di�erenttypes of data within R. You will see a wide variety of things coming up here, creating of datausing random number generators, sequences of data, combinations of numeric and factordata, creation and manipulation of vectors, matrices and operations on these vectors andmatrices. This is probably what you were all more interested in using as I will start outliningsome speci�c R functions for doing speci�c tasks.

First, remember that you can create a simple vector as:

c.data <- c(10, 21, 13, 34, 25)

c.data

## [1] 10 21 13 34 25

Ok, so now we have a vector called c.data in our workspace. So, for instance, R excelsat vectorized operations, so we can do vectorized arithmetic on it, or perhaps do write somecode to estimate summary statistics for the data in the c.data vector:

13


c.data/2

## [1] 5.0 10.5 6.5 17.0 12.5

xbar <- sum(c.data)/length(c.data)

xbar

## [1] 20.6

std.dev <- sqrt(sum((c.data - xbar)^2)/(length(c.data) - 1))

std.dev

## [1] 9.607

Or, since R is a statistical program, we could just use the R internal functions for meanand standard deviation to get the same answers

mean(c.data)

## [1] 20.6

sd(c.data)

## [1] 9.607

Back to vectors for the time being. There are lots of other ways to create vector datausing functions that create sequences of data. For example, we can use a colon (:), a sequenceoperator or repeat function (seq, rep). For numeric arguments, a:b will generate a sequenceof ordered data from a to b. If a and b are integers, then so is the sequence, if not, the areof type double.

1:10

## [1] 1 2 3 4 5 6 7 8 9 10

2:11

## [1] 2 3 4 5 6 7 8 9 10 11

-5:5

## [1] -5 -4 -3 -2 -1 0 1 2 3 4 5

5:-5

## [1] 5 4 3 2 1 0 -1 -2 -3 -4 -5

The colon (:) operator cannot be used with letters (e.g., A:F will not get you a vector= �a�, �b�, �c�, �d�, �e�, �f�) as R will expect the values to be named objects. So, you wouldtypically work with sequences of letters and numbers by combining them (either brute forceor via the interaction function) data frames:

num.factor <- factor(1:4)

alpha.factor <- factor(c("a", "b", "c", "d"))

num.factor:alpha.factor

14


## [1] 1:a 2:b 3:c 4:d

## 16 Levels: 1:a 1:b 1:c 1:d 2:a 2:b 2:c 2:d 3:a 3:b 3:c 3:d 4:a 4:b ... 4:d

interaction(alpha.factor, num.factor)

## [1] a.1 b.2 c.3 d.4

## 16 Levels: a.1 b.1 c.1 d.1 a.2 b.2 c.2 d.2 a.3 b.3 c.3 d.3 a.4 b.4 ... d.4

The colon operator is a simple method for vector creation, but we could also use the seq(sequence) function which can be used with numeric, dates, times, and we can make thesequences change by values other than +1 or -1:

seq(from = 2, to = 6, by = 1)

## [1] 2 3 4 5 6

seq(2, 6, 1)

## [1] 2 3 4 5 6

seq(2, 6, 0.5)

## [1] 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0

seq(-2, 2, 0.5)

## [1] -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0

seq(from = as.Date("2010-04-02"), to = as.Date("2010-04-30"), by = 5)

## [1] "2010-04-02" "2010-04-07" "2010-04-12" "2010-04-17" "2010-04-22"

## [6] "2010-04-27"

In addition, there is a general rep (repeat) function that can be used to generate repeatedsequences of vectors combining data of any mode. Underlying rep are internal options called'each' and 'time' both which will a�ect how your data is put into the sequence:

rep(1:3, each = 3)

## [1] 1 1 1 2 2 2 3 3 3

rep(1:3, time = 3)

## [1] 1 2 3 1 2 3 1 2 3

rep(alpha.factor, each = 2)

## [1] a a b b c c d d

## Levels: a b c d

rep(alpha.factor, time = 2)

## [1] a b c d a b c d

## Levels: a b c d

Because R is so handy, we can actually nest various functions to create data sequences:

15


rep(seq(1, 4, 1), each = 3)

## [1] 1 1 1 2 2 2 3 3 3 4 4 4

rep(rep(c(1, 2), 2), each = 3)

## [1] 1 1 1 2 2 2 1 1 1 2 2 2

# or alternatively

rep(c(rep(1, 3), rep(2, 3)), 2)

## [1] 1 1 1 2 2 2 1 1 1 2 2 2

Now, one of the things we often want to do is look at a speci�c value in a vector. Luckily,values in your vector (or whatever data object you are using) are subscripted by R, sowe can extract a subset of a vector simply and e�ciently, which not surprisingly is calledsubscripting. Subscripting can be inclusive (what to include) or exclusive (what to exclude)and syntax can use names, numeric, or logical subscripts. So, as an example, consider thesequence of data from 5 to 50 by 5's and our interest is in extracting the 7th element.

sub.seq <- seq(5, 50, 5)

sub.seq

## [1] 5 10 15 20 25 30 35 40 45 50

sub.seq[7]

## [1] 35

We could be interested in every value except for the 4th value, which we want to exclude:

sub.seq[-4]

## [1] 5 10 15 25 30 35 40 45 50

You can extract or exclude >1 element:

sub.seq[c(2, 3, 6)]

## [1] 10 15 30

sub.seq[-(3:5)]

## [1] 5 10 30 35 40 45 50

In addition to regular subscripting, logical subscripting is a powerful method of subsettingdata. Remember, there are quite a few logical operators (<, >, <=, >=; less than, greaterthan, less than or equal to, greater than or equal to) you have seen before. Equality usesa double = (==) and exclusion of equality uses the not (!) operator. Logical operationcompares two objects using a operator and returns a logical vector (e.g., the vector willconsist of TRUE or FALSE values):

16


sub.seq

## [1] 5 10 15 20 25 30 35 40 45 50

sub.seq > 30

## [1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE

sub.seq < 30

## [1] TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE

sub.seq == 25

## [1] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE

sub.seq != 25

## [1] TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE

sub.seq == 26

## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

Logical operators can be combined with other operators like & (and) | (or) and ! (not):

lt.50 <- sub.seq < 50

gt.20 <- sub.seq > 20

lt.50 & gt.20

## [1] FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE FALSE

!lt.50 | !gt.20

## [1] TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE TRUE

and you can subscript with logicals as well:

sub.seq[lt.50 & gt.20]

## [1] 25 30 35 40 45

sub.seq[!lt.50 | !gt.20]

## [1] 5 10 15 20 50

Matrices

A matrix is two-dimensional array of vectors, viewed as row vectors or column vectors, whereeach vector is of the same length and mode. Matricies are used for many research purposesin statistics, and typically consist of numeric variables. Matricies have 2 dimensions, thenumber of rows and the number of columns. One convenient way to create matrices isthrough the matrix function, or by using the diag() function to create a diagonal matrix.For example, consider a simple vector x that we want to put into a matrix 'x' with 4 rowsand 2 columns:

17


x <- 1:8

dim(x) <- c(4, 2)

x

## [,1] [,2]

## [1,] 1 5

## [2,] 2 6

## [3,] 3 7

## [4,] 4 8

Of course, we �lled that matrix with data (vector x), but we could have just as easilycreated a matrix with all the same values, or create the matrix above using the matrixfunction:

my.matrix <- matrix(1, nrow = 4, ncol = 2)

my.matrix

## [,1] [,2]

## [1,] 1 1

## [2,] 1 1

## [3,] 1 1

## [4,] 1 1

my.matrix2 <- matrix(1:8, nrow = 4, byrow = TRUE)

my.matrix2

## [,1] [,2]

## [1,] 1 2

## [2,] 3 4

## [3,] 5 6

## [4,] 7 8

You can look at the dimensions of your matrix using some of the internal functions in R,and you can alter the dimensions of a matrix as long as the overall size is the same:

ncol(my.matrix)

## [1] 2

nrow(my.matrix)

## [1] 4

dim(my.matrix)

## [1] 4 2

dim(my.matrix) = c(2, 4)

my.matrix

## [,1] [,2] [,3] [,4]

## [1,] 1 1 1 1

## [2,] 1 1 1 1

and we can create diagonal matrices:

18


my.matrix <- diag(1, nrow = 4, ncol = 4)

my.matrix

## [,1] [,2] [,3] [,4]

## [1,] 1 0 0 0

## [2,] 0 1 0 0

## [3,] 0 0 1 0

## [4,] 0 0 0 1

diag(my.matrix) <- 1:4

my.matrix

## [,1] [,2] [,3] [,4]

## [1,] 1 0 0 0

## [2,] 0 2 0 0

## [3,] 0 0 3 0

## [4,] 0 0 0 4

Obviously you can specify in what order you want the cells of a matrix �lled:

matrix(1:16, nrow = 4)

## [,1] [,2] [,3] [,4]

## [1,] 1 5 9 13

## [2,] 2 6 10 14

## [3,] 3 7 11 15

## [4,] 4 8 12 16

matrix(1:16, nrow = 4, byrow = TRUE)

## [,1] [,2] [,3] [,4]

## [1,] 1 2 3 4

## [2,] 5 6 7 8

## [3,] 9 10 11 12

## [4,] 13 14 15 16

Now, we are not going to get into matrix algebra now (although the commands are readilyavailable for things like multiplication (%*%, transpose using (t), inverse using solve(x)).But, we are going to talk about subscripting and some basic mathematics you can do onmatricies. Multi-dimension objects like matricies require a di�erent approach to subsettingthan vectors as there is an option of an empty subscript or a null dimension. Consider thebelow matrix:

set.seed(10)

mat <- matrix(rpois(16, 4), nrow = 4)

mat

## [,1] [,2] [,3] [,4]

## [1,] 4 1 4 2

## [2,] 3 2 3 4

## [3,] 3 3 5 3

## [4,] 5 3 4 3

Which have 4 rows and 4 columns. If we were interested in the element in the matrixthat was in the 3rd row and the 3rd column, then we would extract that element (5), or, if

19


we wanted to extract an entire row we could identify rows 1 through 4 of the �rst columnas below:

mat[3, 3]

## [1] 5

mat[1:4, 1]

## [1] 4 3 3 5

However, matrix subscripting has been made much simpler, because of the null dimension,we can extract entire rows and columns simply and e�ciently. The trick is to use a comma(,). Thus, for accessing entire rows and/or columns, you can just leave out the subscript forthe dimension you are not interested in. Remember, R will return these values as a vector,not a matrix, so, if you want the information you extract to remain a matrix, you need toadd drop=FALSE to you code (which means you could actually subscript from that matrixas well):

mat[, 1]

## [1] 4 3 3 5

mat[2, ]

## [1] 3 2 3 4

smaller.mat <- mat[1, , drop = FALSE]

smaller.mat

## [,1] [,2] [,3] [,4]

## [1,] 4 1 4 2

smaller.mat[1]

## [1] 4

Remember our earlier discussion on logical subscripting, works here too:

mat > 3

## [,1] [,2] [,3] [,4]

## [1,] TRUE FALSE TRUE FALSE

## [2,] FALSE FALSE FALSE TRUE

## [3,] FALSE FALSE TRUE FALSE

## [4,] TRUE FALSE TRUE FALSE

mat[mat > 3]

## [1] 4 5 4 5 4 4

mat[mat > 3] = -22

mat

## [,1] [,2] [,3] [,4]

20


## [1,] -22 1 -22 2

## [2,] 3 2 3 -22

## [3,] 3 3 -22 3

## [4,] -22 3 -22 3

Dataframes

Dataframes are the typical structures folks use to store data for analysis (most of you wouldcall dataframes a spreadsheet). Similar to matrices in that dataframes have column vectorsof the same length (same number of rows), but di�erent in that dataframes can have columnvectors of di�erent modes. Most data in ecology is of mixed modes, consisting of somecombination of numeric, character, or factor information. So, it bene�ts us to learn howR treats data and what options there are for managing data. First, just so it is easiest,I am going to use a data �le that currently resides in R called iris which is the famousAndersons/Fisher Iris measurement data (cm) for 50 �owers from 3 species of Iris's. Thereare a number of datasets provided with the base distribution of R, which you can accessby typing data() into the R console. iris is actually a internal dataset that is distributedwith R so we will just load it from within R below. For this part, sometimes we just wantto look over the data that you imported into R. Luckily, there are a couple of simple waysto look at the data�le, or parts of it, within R. Using the iris data, we can extract relevantinformation on the dataframe using functions such as str and names, for instance:

data(iris)

head(iris)

## Sepal.Length Sepal.Width Petal.Length Petal.Width Species

## 1 5.1 3.5 1.4 0.2 setosa

## 2 4.9 3.0 1.4 0.2 setosa

## 3 4.7 3.2 1.3 0.2 setosa

## 4 4.6 3.1 1.5 0.2 setosa

## 5 5.0 3.6 1.4 0.2 setosa

## 6 5.4 3.9 1.7 0.4 setosa

str(iris)

## 'data.frame': 150 obs. of 5 variables:

## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...

## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...

## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...

## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...

## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

names(iris)

## [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"

## [5] "Species"

summary(iris)

## Sepal.Length Sepal.Width Petal.Length Petal.Width

## Min. :4.30 Min. :2.00 Min. :1.00 Min. :0.1

## 1st Qu.:5.10 1st Qu.:2.80 1st Qu.:1.60 1st Qu.:0.3

21


## Median :5.80 Median :3.00 Median :4.35 Median :1.3

## Mean :5.84 Mean :3.06 Mean :3.76 Mean :1.2

## 3rd Qu.:6.40 3rd Qu.:3.30 3rd Qu.:5.10 3rd Qu.:1.8

## Max. :7.90 Max. :4.40 Max. :6.90 Max. :2.5

## Species

## setosa :50

## versicolor:50

## virginica :50

##

##

##

But, we could also be interested in working with speci�c columns within a data frame.There are a couple of ways to access and manipulate/summarize speci�c colums in a dataframein R. First, you can extract from a speci�c column by using the $, such as (just showing the�rst 10 records for simplicity):

iris$Sepal.Length[1:10]

## [1] 5.1 4.9 4.7 4.6 5.0 5.4 4.6 5.0 4.4 4.9

You can summarize those columns individually if you so choose:

mean(iris$Sepal.Length)

## [1] 5.843

var(iris$Sepal.Length)

## [1] 0.6857

sd(iris$Sepal.Length)

## [1] 0.8281

summary(iris$Sepal.Length)

## Min. 1st Qu. Median Mean 3rd Qu. Max.

## 4.30 5.10 5.80 5.84 6.40 7.90

Or, another option that some people like when working in R is to attach their data usingthe attach function (see ?attach). Then you can direct access your data based on thecolumn names without identifying the dataframe. I tend not to do this as I don't like havingdataframes attached, especially if I am working with multiple frames with the same columnnames (e.g., GPS data from multiple critters that have the same data columns), but I willquickly for this example do it once:

attach(iris)

Sepal.Length[1:10]

## [1] 5.1 4.9 4.7 4.6 5.0 5.4 4.6 5.0 4.4 4.9

mean(Sepal.Length)

## [1] 5.843

detach(iris)

22


Lists

Lists are the most general structure in R and provide a way for the user to store a collectionof data objects in one location, primarily because there is no limitations on the mode of theobjects that a list may hold. Lists can have elements that can contain any other object, suchas a dataframe, a matrix, a vector, scalar, etc. A list is a vector with mode list. But, listsare often weird for folks to understand, so as an example, �rst I am going to create a fairlysimple list and do some subscripting and manipulation of that list, then, I will compile amore complicated list and show how to manipulate that list. First, consider a simple set ofvectors:

x <- c(11, 34, 56, 17)

y <- c("Bret", "Reagan", "Kennedy", "Lyla")

z <- c(10)

x

## [1] 11 34 56 17

y

## [1] "Bret" "Reagan" "Kennedy" "Lyla"

z

## [1] 10

Here, I am combining the vectors above into a list, which has a mode of 'list' and 3 uniqueobjects (list.a, list.b, list.c):

simple.list <- list(list.a = x, list.b = y, list.c = z)

mode(simple.list)

## [1] "list"

simple.list

## $list.a

## [1] 11 34 56 17

##

## $list.b


##

## $list.c

## [1] 10

Now, we can extract (via subscripting) elements from the list:

simple.list[1]

## $list.a

## [1] 11 34 56 17

simple.list[2]

## $list.b


23


simple.list[3]

## $list.c

## [1] 10

Okay, so, based on what we know about R, we should be able to use a internal functionlike mean() on a list element and get the mean of the list.a portion, for instance.

mean(simple.list[1])

## Warning: argument is not numeric or logical: returning NA

## [1] NA

What, it gave us an NA? This is because $list.a is actually a list containing the vector x.So, to apply operations to elements of a list, you have to identify speci�cally the elements youwant to analyze. In our (and most) situation, the elements of the list have been named, so youcan access said elements using the name of the element with a dollar sign ($), like you wouldto extract columns from a dataframe (which, you may or may not have noticed, dataframesare lists where the list elements are the dataframe columns). Additionally, sometimes youwant to access list elements via their index or a name, then you can use double bracketing([[]]) to subscript lists (this is especially important when writing functions that return listsas the function result).

mean(simple.list$list.a)

## [1] 29.5

mean(simple.list[[1]])

## [1] 29.5

mean(simple.list[["list.a"]])

## [1] 29.5

Well, that is simple enough. Lets try a more complicated list just so we have an examplein our notes. So, my little complicated list will be the c.data from earlier, which is a numericvector of length 5, the iris dataframe, and a made up character vector from earlier withmy families names in it (Bret, Reagan, Kennedy, Lyla):

complicated <- list(c.data = c.data, iris.data = iris, family = y)

str(complicated)

## List of 3

## $ c.data : num [1:5] 10 21 13 34 25

## $ iris.data:'data.frame': 150 obs. of 5 variables:

## ..$ Sepal.Length: num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...

## ..$ Sepal.Width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...

## ..$ Petal.Length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...

## ..$ Petal.Width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...

## ..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

## $ family : chr [1:4] "Bret" "Reagan" "Kennedy" "Lyla"

24


Now, lets assume I want to extract the �rst 10 rows of the list object iris.data and �ndthe mean and variance for the iris dataframe element Sepal.Length. In addition, lets tryto use the internal R function summary() to summarize the irisdata for us as well using acouple of di�erent approaches.

complicated$iris.data[1:10, ]


## 1 5.1 3.5 1.4 0.2 setosa

## 2 4.9 3.0 1.4 0.2 setosa

## 3 4.7 3.2 1.3 0.2 setosa

## 4 4.6 3.1 1.5 0.2 setosa

## 5 5.0 3.6 1.4 0.2 setosa

## 6 5.4 3.9 1.7 0.4 setosa

## 7 4.6 3.4 1.4 0.3 setosa

## 8 5.0 3.4 1.5 0.2 setosa

## 9 4.4 2.9 1.4 0.2 setosa

## 10 4.9 3.1 1.5 0.1 setosa

mean(complicated$iris.data$Sepal.Length)

## [1] 5.843

mean(complicated[[2]]$Sepal.Length)

## [1] 5.843

var(complicated[[2]]$Sepal.Length)

## [1] 0.6857

summary(complicated$iris.data)


## Min. :4.30 Min. :2.00 Min. :1.00 Min. :0.1





## Max. :7.90 Max. :4.40 Max. :6.90 Max. :2.5

## Species

## setosa :50

## versicolor:50

## virginica :50

##

##

##

summary(complicated[[2]])


## Min. :4.30 Min. :2.00 Min. :1.00 Min. :0.1





25


## Max. :7.90 Max. :4.40 Max. :6.90 Max. :2.5

## Species

## setosa :50

## versicolor:50

## virginica :50

##

##

##

A few thoughts on data manipulation

We need to talk about summarizing and/or aggregating data as this is probably something,that at one time or another, you will have to do. Now, the di�erent ways you can summarizedata are pretty much limited only by your imagination or programming skills, so it is ahuge waste of e�ort to focus on all the di�erent ways to aggregate data so I am just going toscratch the surface here to give you a general idea of what can be done. First, R has a varietyof internal functions set up that allow for e�cient summarization of various data types wehave discussed early, things like mean, median or range so just to repeat those here usingthe iris data:

summary(iris)


## Min. :4.30 Min. :2.00 Min. :1.00 Min. :0.1





## Max. :7.90 Max. :4.40 Max. :6.90 Max. :2.5

## Species

## setosa :50

## versicolor:50

## virginica :50

##

##

##

mean(iris$Sepal.Length)

## [1] 5.843

median(iris$Sepal.Length)

## [1] 5.8

range(iris$Sepal.Length)

## [1] 4.3 7.9

Often our interest is in aggregatting data, and there are a ton of ways to do that, includingtable or subset:

26


dogs <- c("Springer", "Bulldog", "Springer", "Mutt", "Chihuahua", "Bulldog")

dog.table <- table(dogs)

dog.table

## dogs

## Bulldog Chihuahua Mutt Springer

## 2 1 1 2

dog.table["Springer"]

## Springer

## 2

as.data.frame(dog.table)

## dogs Freq

## 1 Bulldog 2

## 2 Chihuahua 1

## 3 Mutt 1

## 4 Springer 2

subset(iris, iris$Sepal.Length > mean(iris$Sepal.Length))[1:10, ]


## 51 7.0 3.2 4.7 1.4 versicolor

## 52 6.4 3.2 4.5 1.5 versicolor

## 53 6.9 3.1 4.9 1.5 versicolor

## 55 6.5 2.8 4.6 1.5 versicolor

## 57 6.3 3.3 4.7 1.6 versicolor

## 59 6.6 2.9 4.6 1.3 versicolor

## 62 5.9 3.0 4.2 1.5 versicolor

## 63 6.0 2.2 4.0 1.0 versicolor

## 64 6.1 2.9 4.7 1.4 versicolor

## 66 6.7 3.1 4.4 1.4 versicolor

Ok, getting a bit more complicated, this section is going to be about applying functions,either pre-de�ned or user-de�ned, to repeatedly conduct a set of calculations speci�c todi�erent values of the data. Makes no sense, does it? Well, it will. As a real quick example,consider a simple matrix with 2 rows and 2 columns. Now, based on our previous examplesyou would know how to create this matrix, px}erhaps using the matrix function. Now,matrices have dimensions as you have often seen them described such as 2x3, or 3x5, or 1x1(which is a scalar by the way). Now, in R, the dimensions of the matrix are referred to asmargins, which will be important later. So, consider the following loop set up for gettingthe row sums and columns sums from that matrix:

loop.matrix <- matrix(1:4, nrow = 2, ncol = 2)

loop.matrix

## [,1] [,2]

## [1,] 1 3

## [2,] 2 4

row.sums <- vector("numeric", nrow(loop.matrix))

# Loop over the rows and sum the elements

27


for (i in 1:nrow(loop.matrix)) row.sums[i] = sum(loop.matrix[i, ])

row.sums

## [1] 4 6

col.sums <- vector("numeric", ncol(loop.matrix))

# Loop over the columns and sum the elements

for (i in 1:ncol(loop.matrix)) col.sums[i] = sum(loop.matrix[, i])

col.sums

## [1] 3 7

What do you know, we have written a short piece of code to estimate row and columnsums from a matrix. But, come on, this is R, there has to be something better. Luckily,there is something better, its the family of apply statements. Now, you can do ?apply tolook at the speci�cs, but in a nutshell is apply(yourdata, margin you are interestedin looking at, function you want to apply to that margin). Remember that in amatrix there 2 margins, rows (margin=1) and columns (margin=2).

loop.matrix

## [,1] [,2]

## [1,] 1 3

## [2,] 2 4

apply(loop.matrix, 1, sum)

## [1] 4 6

apply(loop.matrix, 2, sum)

## [1] 3 7

So, lets make up a little bit bigger matrix so we can mess with it some data. Now, wejust did a simple sum of the rows (margin=1) and the columns (margin=2):

big.matrix <- matrix(1:12, nrow = 3, ncol = 4)

big.matrix

## [,1] [,2] [,3] [,4]

## [1,] 1 4 7 10

## [2,] 2 5 8 11

## [3,] 3 6 9 12

apply(big.matrix, 1, sum) #rows

## [1] 22 26 30

apply(big.matrix, 2, sum) #columns

## [1] 6 15 24 33

apply(big.matrix, 1, mean) #mean rows

## [1] 5.5 6.5 7.5

apply(big.matrix, 2, mean) #mean columns

## [1] 2 5 8 11

28


Note, however, R also has pretty nice little functions for simple cases like sum, mean, etc.that will work in this case as well:

rowSums(big.matrix)

## [1] 22 26 30

colSums(big.matrix)

## [1] 6 15 24 33

rowMeans(big.matrix)

## [1] 5.5 6.5 7.5

colMeans(big.matrix)

## [1] 2 5 8 11

Also, note that if NA's exist, you can use na.rm in these apply functions:

big.matrix[2, 2] = NA

big.matrix

## [,1] [,2] [,3] [,4]

## [1,] 1 4 7 10

## [2,] 2 NA 8 11

## [3,] 3 6 9 12

apply(big.matrix, 1, sum)

## [1] 22 NA 30

apply(big.matrix, 1, sum, na.rm = TRUE)

## [1] 22 21 30

rowSums(big.matrix, na.rm = TRUE)

## [1] 22 21 30

Most of the time, you will probably want the structure of the data you are looping over tobe returned to you in the same form as your original data. If you have a list, then lapply isyour friend. Making up a quick list of data, and evaluating the list using lapply will returna list. Notice that c is a vector of character values, so when you try to take the mean, youshould get an NA:

my.list <- list(a = 10:20, b = rnorm(10), c = c("A", "b", "A", "b", "A", "b",

"A", "b"))

lapply(my.list, mean)


## $a

## [1] 15

##

29


## $b

## [1] 0.04043

##

## $c

## [1] NA

If we don't want a list returned, we could use sapply which would return a vector or amatrix:

sapply(my.list, mean)


## a b c

## 15.00000 0.04043 NA

Uses for the various apply statements are wide-ranging, so I am showing only quickexamples here as you will need to just go and play with them some to see what worksbest for you. Here is an example I use for estimating survival in a simulation model withdemographic stochasticity (e.g., everyone survives based on a random draw from a binomialwith probability equal to the user de�ned survival estimate):

No.alive <- c(100)

low.survival <- 0.25

high.survival <- 0.75

low <- sapply(lapply(1, function(i) sample(x = c(1, 0), replace = T, size = No.alive,

prob = c(low.survival, 1 - low.survival))), sum)

high <- sapply(lapply(1, function(i) sample(x = c(1, 0), replace = T, size = No.alive,

prob = c(high.survival, 1 - high.survival))), sum)

low

## [1] 21

high

## [1] 73

There are some pretty useful internal R functions for aggregating data, such as oh, I don'tknow, aggregate which work pretty well, for instance, with the iris data

aggregate(iris[, 1:4], list(Species = iris[, 5]), mean)

## Species Sepal.Length Sepal.Width Petal.Length Petal.Width

## 1 setosa 5.006 3.428 1.462 0.246

## 2 versicolor 5.936 2.770 4.260 1.326

## 3 virginica 6.588 2.974 5.552 2.026

3.4 Basic Mathematical/Operators

First, how do we use R as a calculator (and why are we doing this now and not at thebeginning)? Since R is interactive, you want to use R to do some basic calculations so youget the hang of it as the basic calculations are what build up to be fairly complex calculations.

30


So, here are a few really quick examples showing how R can be used to get the result for anyequation by typing in the equation and it will return to you as shown below:

1 + 1

## [1] 2

sqrt(8)

## [1] 2.828

exp(1)

## [1] 2.718

For each example, the result is a vector containing a single number. The [1] that you seebefore each value represents the fact that after R computes a result, it is calling a generic(default) print function to display the contends of the vector. For example, you could callthe print function explicitly:

print(1 + 1)

## [1] 2

print(sqrt(8))

## [1] 2.828

print(sqrt(8), digits = 5)

## [1] 2.8284


## [1] 2.828427125

and, to be honest, you can probably get more precision than you would ever need:


## [1] 2.8284271247461902909

R can do pretty much any basic mathematical operation you need.

2 + 2

## [1] 4

4 - 2

## [1] 2

2 * 2 * 2

## [1] 8

2/2

31


## [1] 1

sqrt(16)

## [1] 4

In addition, R has a set of logical operators (?Logic) which can be used for a wide varietyof manipulations. Consider the made up data below for log.data and x.

log.data <- 1 + (x <- rpois(20, 1)/3)

x

## [1] 0.6667 0.3333 0.6667 0.3333 0.0000 0.3333 0.3333 0.0000 0.3333 0.3333

## [11] 0.0000 0.6667 0.3333 0.3333 0.0000 0.6667 0.3333 0.0000 0.0000 0.6667

log.data

## [1] 1.667 1.333 1.667 1.333 1.000 1.333 1.333 1.000 1.333 1.333 1.000

## [12] 1.667 1.333 1.333 1.000 1.667 1.333 1.000 1.000 1.667

You can do random number generation pretty simply and quickly (you will see more ofthis later on):

rnorm(10)

## [1] -0.3640 -1.2070 1.4292 0.6334 -1.9968 -0.6818 -0.4601 -0.9831

## [9] 0.4953 0.7258

rpois(10, 10)

## [1] 12 13 16 13 15 17 9 19 13 7

You can work with 'NA' values within your data in di�erent ways. First, note that I amreplacing the 3rd value in the log.data vector with a 'NA' means that using a simple functionlike 'mean' will return 'NA' because the log.data vector contains missing values. When thisoccurs, R has some handy functions for handling data with missing values, usually usingna.rm= or something like that:

log.data[3] = NA

mean(log.data)

## [1] NA

mean(log.data, na.rm = TRUE)

## [1] 1.298

Or, alternatively, you could do this and get the same answer:

log.data[3] = NA

newlog.data = na.omit(log.data)

mean(newlog.data)

## [1] 1.298

32


And remember, R does arithmetic on vectors and matrices just �ne:

c.data

## [1] 10 21 13 34 25

2 * c.data

## [1] 20 42 26 68 50

loop.matrix

## [,1] [,2]

## [1,] 1 3

## [2,] 2 4

loop.matrix - 5

## [,1] [,2]

## [1,] -4 -2

## [2,] -3 -1

Date and time

This will be a pretty quick section as there are quite a few di�erent ways to deal with date-time classes, and for the most part when you deal with them it will be in a categorizationor subsetting context. Dates in R are pretty simple to deal with, and there are a varietyof options for working with dates. First, and probably the simplest introduction is to justcreate a date in some format and play with it. So, for example:

Sys.time()

## [1] "2014-02-16 13:56:36 CST"

as.Date(Sys.time())

## [1] "2014-02-16"

as.Date("2010/04/02")

## [1] "2010-04-02"

Now, whats nice about dates is you can manipulate them pretty easy by changing theformat string to get them in the format you are needing. Note, very important, when youare re-formatting the dates, you have to use the exact same description in the format()command, e.g., if your date has a comma after the day, your format command has to havea comma after the %d as well or you will get a NA (see the last example using 1 September1973).

as.Date("2010-4-2", format = "%Y-%m-%d")

## [1] "2010-04-02"

as.Date("April 2, 2010", format = "%B %d, %Y")

33


## [1] "2010-04-02"

as.Date("2April10", format = "%d%b%y")

## [1] "2010-04-02"

as.Date("September 1, 1973", format = "%B %d %Y")

## [1] NA

as.Date("September 1, 1973", format = "%b %d, %Y")

## [1] "1973-09-01"

Or, we can also tell R to get out the current time to play with you can do something likethis to see the current time and assign it a name:

Sys.time()

## [1] "2014-02-16 13:56:36 CST"

system.time <- Sys.time()

str(system.time)

## POSIXct[1:1], format: "2014-02-16 13:56:36"

system.time

## [1] "2014-02-16 13:56:36 CST"

So now we have a object called system.time that has a date-time combination. Alsonot that I used str to look at it and it was of class POSIX, which is a common formatfor date-time values. I tend to use POSIX classes more frequently than most other datefunctions (e.g., chron package) because it stores time to the nearest second. POSIX data'sinput format is year, then month, then day, a space, then time in hours:minutes:seconds.POSIX works similarly to other date functions for manipulating dates between formats aswell:

time.posix <- c("2010-4-2 09:00:30")

as.POSIXct(time.posix)

## [1] "2010-04-02 09:00:30 CDT"

class.date <- strptime("2/April/2010:08:01:27", format = "%d/%b/%Y:%H:%M:%S")

str(class.date)

## POSIXlt[1:1], format: "2010-04-02 08:01:27"

You can work with dates pretty easily. For example, we can create a time sequence ofdates and then run some general R functions on them looking at date ranges, mean date,time between dates, etc:

seq(as.Date("2010-4-2"), by = "days", length = 5)

## [1] "2010-04-02" "2010-04-03" "2010-04-04" "2010-04-05" "2010-04-06"

34


dates <- seq(as.Date("2010-4-2"), by = "days", length = 30)

mean(dates)

## [1] "2010-04-16"

range(dates)

## [1] "2010-04-02" "2010-05-01"

summary(dates)

## Min. 1st Qu. Median Mean 3rd Qu.

## "2010-04-02" "2010-04-09" "2010-04-16" "2010-04-16" "2010-04-23"

## Max.

## "2010-05-01"

dates[8] - dates[1]

## Time difference of 7 days

difftime(dates[8], dates[1], units = "hours")

## Time difference of 168 hours

35


4 R Creating Graphics

Few tools in ecology (or in any �eld for that matter) are as powerful as a graphical rep-resentation of your data. We should use graphs as an analytical tool to assist with datavisualizing and analysis, but lots of times folks just use graphs to summarize statistical re-sults from their analysis (e.g., showing means). I can talk on and on about graphs in R (seeCollier 2008), but as to not waste time, concisely, there are not too many things graphs youcannot make in R, period. So, to show you some examples of graphs, I am going to createseveral datasets, some will be simple univariate simulations, some will be a bit more com-plicated dataframes. The reason I am using simulated data is two-fold: 1) using simulateddata helps you to understand the structure of the data because you created it, so you knowwhat it should look like and can use the tools we learned earlier to get other datasets intothe correct format, and 2) its fairly simple to simulate a wide range of data types quickly ande�ciently, rather than try to load individual datasets, walk through all the manipulation inthis document, and then do some examples (although that is on the horizon). One thing itis important to notice, lots of time you will see the same code put into a command. That isbecause R can use the same command, like the function call col for de�ning the color youwant to use. Remember this, its handy (see ?par for more details).

4.1 Scatterplots

So, lets start with the easy stu�, a scatter plot. In the simplest sense, we can create plots inR pretty quick with short statements.

plot(rnorm(100, 10, 1), main = "A scatterplot")

●

●●

●●

●●

●

●●

●

●

●

●●

●

●

●●

●●

●

●●

●●

●●

●

●

●

●

●

●●

●

●

●

●●

●

●●●●

●

●●

●●●

●●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●●●

●

●

●

●

●

●

●

●●

●

●

●●●

●●●

●

●

●

●

0 20 40 60 80 100

89

1113

A scatterplot

Index

rnor

m(1

00, 1

0, 1

)

First, a few things to notice. First, this is just a very simple scatter of made up points,there is not rhyme or reason to them. Second, there is really no relationship between thex-axis values and the y-axis values, as basically I just simulated 100 points from a normal

36


distribution with a mean of 10 and a standard deviation of 1 (?Normal) and the index (x-axis) is the order they were simulated in. Also, note that R provides some default axis labels,the y-axis is basically what was called with the plot command from above, and the x-axisvalue Index was just the order of simulation as de�ned before. Nothing is really formatted,the axis labels are laying the wrong way on the y-axis, font is weird, so on and so forth.

Ok, but we can obviously do more with R than some dumb scatter plot data. What if,for example, we have data where we actually have a good reason to label the axes correctly,such as data relating counts of the eyeworms in the eyes of Quaily Mc'OweMyEyesHurt1 tothe mass of Quaily Mc'OweMyEyesHurt (this is Texas and lots of people see to care abouteyeworm numbers right now, so its topical but see the footnote...). Below I made up acompletely ridiculous dataframe for example plotting purposes only:

set.seed(10002)

worms = round(rnorm(50, 66, 10), digits = 0)

presence = factor(round(rbinom(50, 1, 0.7), digits = 2))

mass = worms + 3 * (round(rnorm(50, 0.25, 0.2), digits = 2))

long = worms - 0.5 * (round(rnorm(50, 60, 15), digits = 2))

group <- factor(rep(1:5, 10))

quaily <- data.frame(worms, presence, mass, long, group)

attach(quaily)

## The following objects are masked _by_ .GlobalEnv:

##

## group, long, mass, presence, worms

str(quaily)


## $ worms : num 61 53 54 55 65 49 71 69 56 66 ...

## $ presence: Factor w/ 2 levels "0","1": 2 2 2 1 2 2 1 2 1 2 ...

## $ mass : num 61.8 53.9 55.2 55.5 65.5 ...

## $ long : num 27.92 18.5 10.83 9.02 36.86 ...

## $ group : Factor w/ 5 levels "1","2","3","4",..: 1 2 3 4 5 1 2 3 4 5 ...

head(quaily)

## worms presence mass long group

## 1 61 1 61.81 27.92 1

## 2 53 1 53.90 18.50 2

## 3 54 1 55.23 10.83 3

## 4 55 0 55.48 9.02 4

## 5 65 1 65.48 36.86 5

## 6 49 1 50.41 15.97 1

So, we can see that our dataframe quaily has a couple of continuous variables, a couple offactor variables, and is all around ridiculous. But, lets go ahead and plot some data anyway.So, what do we see when we look at this �gure?

� The axis values look approximately correct (although notice that there are a few valueson the graph >80, yet the x-axis only goes to 80) so we will probably want to adjustthose;

1There is no such thing as a �Quaily� and and it obviously does not reference any speciesin Texas and as far as I know, �Mc'OweMyEyesHurt� is not a real word

37


� The numbers at each tick mark on the y-axis are parallel to the axis, which makes themharder to read;

� The graph is contained in a box, neither good nor bad, its more of a preference thing;

� The labels for each axis are correct, but they do not convey much information;

� There is no �gure title (not that it's needed)

plot(mass, worms)

●

●●●

●

●

●●

●

●●●●

●

●

●●

●●

●

●

●

●

●

●

●

●

●●

●●

●●

●

●●●

●

●●

●

●

●

●●

●

●

●

●●

50 60 70 80

5060

7080

mass

wor

ms

So, there are quite a few things we might want to change with this graph, correct? Well,lets change them. When you want to change things in your graph, ?par is your friend. ?parprovides a detailed list of the many options for manipulating graphs in R, so, lets make itpretty:

plot(mass, worms, las = 1, main = "This is a Wormy Quaily Graph", ylab = "Quaily Worms",

xlab = "Quaily Fatness", pch = 19, col = "red", xlim = c(40, 90), ylim = c(40,

90))

38


●

●●●

●

●

●●

●

●●●●

●

●

●●

●●

●

●

●

●●

●

●

●

●●

●●

●●

●

●●●

●

●●

●

●

●

●●

●

●

●

●●

40 50 60 70 80 90

40

50

60

70

80

90

This is a Wormy Quaily Graph

Quaily Fatness

Qua

ily W

orm

s

Wow, pretty. Amazingly, when you want to �nd a relationship, you can! For our exampleon quaily and eyeworms, it looks as if there is a positive relationship between worm numbersand quaily mass, so what if we wanted to add a plot of the linear regression curve to thisplot? Well, we could run the regression, �t the line, and just for kicks, lets �t the verticalerror distances as well.



90))

quaily.reg <- lm(worms ~ mass)

abline(quaily.reg, col = "blue", lwd = 2)

fit.quaily <- fitted(quaily.reg)

segments(mass, fit.quaily, mass, worms, col = "blue")

39


●

●●●

●

●

●●

●

●●●●

●

●

●●

●●

●

●

●

●●

●

●

●

●●

●●

●●

●

●●●

●

●●

●

●

●

●●

●

●

●

●●

40 50 60 70 80 90

40

50

60

70

80

90


Quaily Fatness

Qua

ily W

orm

s

So, you get the idea that there are all kinds of cool ways to manipulate data and makegraphs. Below I will show a few examples of di�erent types of plots that are typically used. Itried to keep most of these examples fairly consistent with what can be easily found in eitherthe R help �les for each plot type, or what you would �nd when you google �R barplot�, sothat you will be able to �nd some additional examples later and match them to what we didin class.

4.2 Other Simple plots

So, barplots, the workhorse of wildlife ecologists (and often called histograms, don't do that).Using the mtcars dataset in base R, a quick barplot

data(mtcars)

count = table(mtcars$gear)

barplot(count, main = "Example Barplot", xlab = "Gear number")

40


3 4 5

Example Barplot

Gear number

04

812

Wow, that is, simple, how about one this one, just a little di�erent example.

# Grouped Bar Plot

counts <- table(mtcars$cyl, mtcars$gear)

barplot(counts, main = "Car Distribution by Gears and Cylinders", xlab = "Number of Cyllinders",

col = c("red", "yellow", "blue"), legend = rownames(counts), beside = TRUE,

las = 1)

3 4 5

468

Car Distribution by Gears and Cylinders

Number of Cyllinders

0

2

4

6

8

10

12

What about con�dence intervals, we need to do that, right? Here are a couple of di�erentways to add con�dence intervals to a barplot, or just create con�dence intervals (straightfrom the plotCI help �le.

41


library(plotrix)

data(warpbreaks)

attach(warpbreaks)

err = y = runif(10)

wmeans <- by(warpbreaks$breaks, warpbreaks$tension, mean)

wsd <- by(warpbreaks$breaks, warpbreaks$tension, sd)

## note that barplot() returns the midpoints of the bars, which plotCI uses

## as x-coordinates

plotCI(barplot(wmeans, col = "gray", ylim = c(0, max(wmeans + wsd))), wmeans,

wsd, add = TRUE)

L M H

010

3050

●

●

●

## using labels instead of points

labs <- sample(LETTERS, replace = TRUE, size = 10)

plotCI(1:10, y, err, pch = NA, gap = 0.02, main = "plotCI with labels at points",

las = 1)

text(1:10, y, labs)

42


2 4 6 8 10

0.0

0.5

1.0

1.5

plotCI with labels at points

1:10

y

V L

D

C

VQ

P

P

W

V

Now, there are tons of ways to do this, lots of R packages can be used to add con�denceintervals, some more elegantly than others. But, its important to realize that you can do itfor many di�erent types of plots, for instance, a logistic regression

set.seed(123)

mydata = data.frame(Response = rbinom(100, 1, 0.5), Predictor = rnorm(100, 100,

50))

attach(mydata)

test.glm = glm(Response ~ Predictor, family = "binomial")

predict.data = seq(4, 496, 4)

y = plogis(test.glm$coefficients[1] + test.glm$coefficients[2] * predict.data)

xy = data.frame(Predictor = predict.data)

yhat = predict(test.glm, xy, type = "link", se.fit = TRUE)

upperlogit = yhat$fit + 1.96 * yhat$se.fit

lowerlogit = yhat$fit - 1.96 * yhat$se.fit

ucl = plogis(upperlogit)

lcl = plogis(lowerlogit)

plot(predict.data, y, ylim = c(0, 1), type = "l", lwd = 2, ylab = "Prob(Success)",

xlab = "Predictor Variable", xaxt = "n", las = 1)

axis(1)

lines(predict.data, ucl, lty = 2, lwd = 2)

lines(predict.data, lcl, lty = 2, lwd = 2)

43


0.0

0.2

0.4

0.6

0.8

1.0

Predictor Variable

Pro

b(S

ucce

ss)

0 100 200 300 400 500

Another simple one is a dotchart,

y1 <- runif(2)

g <- c("0-50", "50-100")

dotchart(y1, g, pch = 20, xlim = c(0, 1))

0−50

50−100

●

●

0.0 0.2 0.4 0.6 0.8 1.0

Which can be used to create some pretty elegant graphs rather quickly that show lots ofdata, for instance using the mtcars dataframe.

x <- mtcars[order(mtcars$mpg), ]

# sort by mpg x$cyl <- factor(x$cyl) it must be a factor

44


x$color[x$cyl == 4] <- "red"

x$color[x$cyl == 6] <- "blue"

x$color[x$cyl == 8] <- "darkgreen"

dotchart(x$mpg, labels = row.names(x), cex = 0.7, groups = x$cyl, main = "Gas Mileage",

xlab = "Miles Per Gallon", gcolor = "black", color = x$color)

Cadillac FleetwoodLincoln ContinentalCamaro Z28Duster 360Chrysler ImperialMaserati BoraMerc 450SLCAMC JavelinDodge ChallengerFord Pantera LMerc 450SEMerc 450SLHornet SportaboutPontiac Firebird

Merc 280CValiantMerc 280Ferrari DinoMazda RX4Mazda RX4 WagHornet 4 Drive

Volvo 142EToyota CoronaDatsun 710Merc 230Merc 240DPorsche 914−2Fiat X1−9Honda CivicLotus EuropaFiat 128Toyota Corolla

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

10 15 20 25 30

Gas Mileage

Miles Per Gallon

Again, there are many ways to create a graph, here are some examples using ggplot2and

# create factors with value labels

library(ggplot2)

## Loading required package: methods

mtcars$gear <- factor(mtcars$gear, levels = c(3, 4, 5), labels = c("3gears",

45


"4gears", "5gears"))

mtcars$am <- factor(mtcars$am, levels = c(0, 1), labels = c("Automatic", "Manual"))

mtcars$cyl <- factor(mtcars$cyl, levels = c(4, 6, 8), labels = c("4cyl", "6cyl",

"8cyl"))

# Scatterplot of mpg vs. hp for each combination of gears and cylinders in

# each facet, transmittion type is represented by shape and color

qplot(hp, mpg, data = mtcars, shape = am, color = am, facets = gear ~ cyl, size = I(3),

xlab = "Horsepower", ylab = "Miles per Gallon")

4cyl 6cyl 8cyl

●

●●

●

●

●●

●

●●●●

●●

●●●●

●

10

15

20

25

30

35

10

15

20

25

30

35

10

15

20

25

30

35

3gears4gears

5gears

100200300 100200300 100200300Horsepower

Mile

s pe

r G

allo

n

am

● Automatic

Manual

And another example on the same data

qplot(mtcars$gear, mtcars$mpg, data = mtcars, geom = c("boxplot", "jitter"),

fill = gear, main = "Mileage by Gear Number", xlab = "", ylab = "Miles per Gallon")

46


●●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

10

15

20

25

30

35

3gears 4gears 5gears

Mile

s pe

r G

allo

n

gear

●

●

●

3gears

4gears

5gears

Mileage by Gear Number

We can even plot spatial locations quick and easy, for instance, here are some Texasturkey GPS locations (more on this later)...

suppressPackageStartupMessages(library(moveud))

data(rawturkey)

par(mfrow = c(2, 1))

plot(rawturkey$Lon, rawturkey$Lat, main = "Unedited Points", pch = 20, col = "red",

xlab = "Longitude", ylab = "Latitude", las = 1, cex.axis = 0.7)

newrawturkey = rawturkey[rawturkey$Lon < -98.1, ]

plot(newrawturkey$Lon, newrawturkey$Lat, main = "Edited Points", pch = 20, col = "red",


47


●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●

●●●

●●●

●

●●●●●●●●●●●●

●●●●●●●

●

●●●●●●●●●●●●

●●●●●●●●●

●●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●● ●

●●

●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●● ●

●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●

●●●●●●●●

●●●●●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●

●●●●

●●●●●●●●●●●●●●

●●●

●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●

●●●

●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●

●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●

●

●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●● ●●●

●●●●●●●●●●●●●●●

●●●●●

●●

●●●●●●●●●●●

●●●●●●●●

●

●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●

●●● ●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●

●●●●●●●●●●●●●

●●

●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●

●

●●●●●●●●●

●

●●●●●●●●●●●●●●●●●

●

●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●

●●●●●●●●●●●●●

●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●

●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

● ●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●● ●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●

●●●●●●●●●●●●●●●

−98.4 −98.3 −98.2 −98.1

27.84

27.86

27.88

27.90

27.92

Unedited Points

Longitude

Latit

ude

●●●●●●●

●

●

●

●●

●●●●●●●

●●●●●●●●●●●●●●

●

●●●

●●●●●●●

●●

●

●●●

●

●●●●●●●●●● ●

●

●●●

● ● ●

●

●

●●●●●●

●●●●●

●

●●●●●●●●

●

●

●

●●

● ●

●●● ●

●●●●

●● ● ●●●●●

●●●●●●●

●●

●●●●

●

● ●●

●

●

●●

●●●●●

●●

●

● ●●●●●●●●

●●●

●●●

●●●

●

●●●●●●●●●●●●●●●●●●● ● ● ●

●●●●●●●●

●●●●

●●●●●●●●

●●●

●

●●● ● ●●●●

●●

●

●●

●●●

●●●●●●●●

●●●● ●●●●●●●●●●●●●● ● ●●●

●●●●●●●●●● ●

●●

●

●

●●●

●

●●●

●●●●

● ● ●

●

●●●●●●●●●

●

●

●●●●●

●● ●●●●●

●● ●●●●●●

●●●●●●●

●●

●●

●●

●●● ●●

●●● ●●●●●●●●

●●●●●●●●

●●●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●

●●●

●●●● ●

●●●

●● ● ●

●●

●●

●

●

●

●●●●●●●●●●

●●●

●

●●

●

●

●●

● ●●●●●●●●●●●●

●●

●

●●●●● ●

●● ●● ●

●●

●●●●

●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●

●●

●●● ●●●

●●

●●

●

●●●●●●●●●●●●●

● ●●●● ●

● ●●

●●

●● ●●●●

●●●●●●

●●●●●●●●●●●●●

●●●

●●●●● ●●●●●●●●●●● ●

●●●

●●●

●●●●●●

●●●

● ●●●●●

●● ●● ●●

●●●●●●●●●●●●●●●●●●●●●●

● ●●● ●●●●

●●●●●●●●●

●

● ●●●● ●●● ●●●

●●●●●●●●● ●●●●●●●●●●●●●●

●●●●●●

●

●●●●●●●●●●●●●

●●● ● ●●●

●●●●●●●●●●●●●●

● ● ●●●●●

●

●●●●●●●●●●●●●●●●

●●

●●●●●●

●●●●●●●●●●●●●●●●●●

● ●●● ●

● ●●●●●●●●

●

●

●

●●●

●●

●

●●●● ● ●●●

●●●●●

●

●

●● ● ● ●●

●●

●●●●●●●●●●●●

●●

●

●●●●

●●

●●●

●●●●

●●

●●●

●●

●

● ● ●● ●●●●●

●●●●●●●●● ●●●

●●●●● ●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●

●

●●●●●

●●●● ●●●●●●●●●●●●●● ●●●

●

● ●

●●● ●●●●●●●●●●●

●

●

●●

●

●●●●

●●●●●●●●●●●●● ●● ●●

●●

● ●●●

●●●●●●●●●●

●●●●●●●●

●●●

●●●●●●●●●●●●●

●

●●● ●

●●● ●●●●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●

●

●●

●● ● ●●●●●●●●●●●●●

● ●●●●●●●●● ●●●●

●●●●●●●●●●●●

●

●

●●● ●● ●●●●●●●●●●

●●●

●●●●●●●●

●●●●●●●●●

●●

●

●●●●●

●●●●●●●●●●●●●●●

●●●●●●● ●●●●● ●●●●●

●●●●● ●●●●● ●

●●●

●●●●●●●●●●

●

●

●●●●●

●●●● ●●●●

●●●●●

●●

●●●●● ● ●●●●●●●●●●●

●●●●

●

●●●

●

●

●

●

●● ●● ●●● ●●●●

●

●

●

●●●

●● ●●●●●●●●●●●

●●●●●●

●●●●●●●●●●● ●●●● ●

●●●● ●

●●● ●

● ●●●●●●●●●●

●●●●●●●●● ●●●●●●●●●●●

●

●●●●● ● ●● ●●●●●●●●●●

●

●●●

●●●●●●●●●●●●● ●●●●● ●●

●●

●●●● ●● ●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●

●●●●● ●

●

●●

●●

●●●●●●●●●●●●●● ●

●●●●●

●

●

●●●●

●

●

●●●●●

●●●●●

●●●

●

●● ●

●●●●●●●●●●●●●●

●●●●

●●●●●

●●●●●●●

● ●●●●●

●●

●●●

●●

●●●●●●●●●●●●●

●

●●●

●●●●●●

●●

●●

●

● ●● ● ●●●●●●

●●●●

●●●●●●●●●●●●●

●

● ● ●● ● ●●●●●●

●●●●●

●●●●

●

●●●

●● ●●

●

●●●●●●●●●●●●

●

●

●

●

●●●●● ●●●●●●●●●●●●

●●●●●●●

●●●

●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●

●

●

●

●●●●● ●● ● ●●●●●●●●●●●●●●

●●●●● ●●●●●●●●●●●●●●●●

●

●

●●

● ●●●●●●●●●●●●●●●●●

●●●

●●●●●●●

● ●●

●

●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●

●●

● ●●●●●●●

● ●● ●●●●●

●●

●●●

●

●●

●●

●●●●●●●●●●●●●● ● ●●●●●●

●●●●●●●

●●●

●●●●

●

●

●●●●●● ●●●●●●●●●●●●●

●

●●●●●●●

● ●●●●●●

●●

●●●●●●

●

●●

●●●●●●●●●●●●●●●●

●

●●●●

●●● ●

● ●●

●●

●●●●

●●

●

●● ●●●●●●●●●●●●●●●●●●●●

●

●●●●●●● ●

●●

●

● ●●

●

●●● ●●●●●

●

●

● ●●●●●●●●●●●●●●●●

●

●

●●● ●●●●●●●●●●

●●

●●●

●●●●●● ● ●●●●●●●●●●●●●●●

●

●●● ● ●

●●●●

● ●●●●●●●

●●●● ●

●

● ●● ●●●●●●●●●●●●●●

●

●●●●

●●●●●●

●●●●●●●●●●

●●

●●●●

●

●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●● ●●●●●●

●●●●

●

●●

●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●

●●●●●●●

●

●●

●

●●●●●●●●●●●●●●●●●

●

●

●●●●●●●●●●

●●● ●●●●

●●

●●

●

●●●

●●●●●●●●●●●●●●●●

●

● ●●●●●

●●●●●●●●●●

●●

●

●● ●

●

●●

●●●●●●●●●●●●●●●●●

●●

● ●●● ●●●●●●●●●

●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●

●

●●

●●● ●●●●●●●●●●●●●●● ● ●●

●●

● ●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●● ●●

● ●●●

●●●●●●●●●●●●●●●●● ●●●●

●●●●●●●●●●●●● ●●●●

● ● ●

● ●●●●●●●●●●●●●●●●●

●

●●●●●● ●● ●●●●●● ●

●● ● ●●●

●●

● ●

●●●●●●●●●●●●●●●●●

●●●● ●

● ●●● ●●● ●●

● ●● ● ● ●●●●

●●●●●●●●●●●●●●●●●●●

●

●

●●●●● ●●●●●●● ●●

●

●●● ●● ● ● ●● ●●●●●●●●●●●●●●●●

●

● ●●●●●●●●●●●

●●● ●

● ● ● ● ●

●●●●●●●●●●●●●●●●●

●

●

●●●●●

●●●●●●●

● ●●●●●●●●●

● ●● ●●●●●●●●●●●●●●●●●

●

●

●●●●●●●

●●●●

●●●●●●●●●

●●

●

●

●●●●●●●●●●●●●●●●●●

●

●● ●

●●

● ●● ●

●●●●

●●●

● ●●●

●

●●

● ●●●●●●●●●●●●●●●●●●

●

●●●●

●●●●

●●●

●●●

●

●●

●

●

●

●●

●● ●●●●●●●●●●●●●●●●●●

●

●●● ●

●●●●●●●●●

●●● ● ●●

●● ●● ● ●●●●●●●●●●●●●●●●●●

●

●

●●●●●●●●● ●●●

●● ●●● ●●

●

●

●●●●●●●●●●●●●●●●●●●●

●

●●●●●

●●●●●●●●●●●●

●●

●

●

●●●●●●●●●●●●●●●●●●●

●

●

●●●●●●●●●● ●

●●●●●●●

●

●

●● ●●●●●●●●●●●●●●●●●

●●●

●●●●●●●●

●

● ● ●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●

●

●●●●●●●●●●●●●●

●

●

●●

●

●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●

●●●

●

●

●

●

●●

●

●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●

●

●

●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●

●

●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●

● ●●●●●●●●●●●

●

●

●●●●●●

●●●●●●●●●●●●●●●●●●●● ● ●

●●●●●●●●●● ●●●●●

●

●

●●●●

● ●●●●●●●●●●●●●●●●●●●●

●

●●● ●●●

●●●●●●●●●●

●●

●

●●

●● ●●●●●●●●●●●●●●●●●●●●

●

●

●●●●●●

●●●

●●●●●●●●●● ●●●●●●

−98.39 −98.38 −98.37 −98.36 −98.35

27.89

27.90

27.91

27.92

27.93

Edited Points

Longitude

Latit

ude

48


5 Statistical Models with R

5.1 Contingency Tables

Obviously, being able to enter data in contingency tables is a pretty useful skill. As a quickexample, so you get a feel for it, lets create a quick 9 by 2 contingency table in R using somenest predator data from Dreibelbis et al. (2008). Note that two way tables need to be matrixobjects. Now, because we wanted the data entered column-wise, we used byrow=F (whichwould have been the default if we had not included the byrow=F). You can see what happensif you don't de�ne byrow= by changing it from T to F. Ok, so know we have de�ned theobject class.status, but it is just a matrix, no column or row headings. This is important,as many times you will want to add column and row headings to your data. Easiest way,use colnames or rownames.

class.status <- matrix(c(0, 0, 2, 4, 2, 0, 2, 1, 3, 1, 1, 1, 2, 7, 3, 0, 0,

4), nrow = 9, byrow = F)

class.status

## [,1] [,2]

## [1,] 0 1

## [2,] 0 1

## [3,] 2 1

## [4,] 4 2

## [5,] 2 7

## [6,] 0 3

## [7,] 2 0

## [8,] 1 0

## [9,] 3 4

colnames(class.status) <- c("2006", "2007")

rownames(class.status) <- c("Nine-banded Armadillo", "Bobcat", "Feral hog",

"Gray fox", "Common raccoon", "Common raven", "Striped skunk", "Texas rat snake",

"Total multiple predator events")

class.status

## 2006 2007

## Nine-banded Armadillo 0 1

## Bobcat 0 1

## Feral hog 2 1

## Gray fox 4 2

## Common raccoon 2 7

## Common raven 0 3

## Striped skunk 2 0

## Texas rat snake 1 0

## Total multiple predator events 3 4

Often we will have data in some sort of a dataframe where we have 1 row for each datapoint in the dataset. So, lets try some examples using our earlier dataset called quaily.Now, since you have the quaily loaded, lets play a bit with it by using the function table.Using table() we can look at the raw counts of the number of times parasites were present,a cross-tab of parasite presence by group, we can even look at the proportion of each countthat falls in each category using the function prop.table().

49


head(quaily)

## worms presence mass long group

## 1 61 1 61.81 27.92 1

## 2 53 1 53.90 18.50 2

## 3 54 1 55.23 10.83 3

## 4 55 0 55.48 9.02 4

## 5 65 1 65.48 36.86 5

## 6 49 1 50.41 15.97 1

table(quaily$presence)

##

## 0 1

## 10 40

table(quaily$presence, quaily$group)

##

## 1 2 3 4 5

## 0 2 3 0 3 2

## 1 8 7 10 7 8

table.quaily <- (table(quaily$presence, quaily$group))

prop.table(table.quaily)

##

## 1 2 3 4 5

## 0 0.04 0.06 0.00 0.06 0.04

## 1 0.16 0.14 0.20 0.14 0.16

Tables can get extremely complicated really quick, and R can make looking at data usingtables pretty easy (e.g., see ftable or xtabs as other options for looking at tabular data).But, what if we are interested in conducting some statistical evaluations on data in tables?The list is endless of what you can do, but lets do a example of a test of independentproportions and a chi-square test on a 2 by 2 contingency table. Lets assume that our dataconsists of the number of juvenile and adult �sh that successfully survived some experimentaltesting done over in the Biology (nerd's) building.

fish.not.dead <- c(10, 6)

fish.total.tested <- c(20, 21)

prop.test(fish.not.dead, fish.total.tested)

##

## 2-sample test for equality of proportions with continuity

## correction

##

## data: fish.not.dead out of fish.total.tested

## X-squared = 1.179, df = 1, p-value = 0.2776

## alternative hypothesis: two.sided

## 95 percent confidence interval:

## -0.1267 0.5552

## sample estimates:

## prop 1 prop 2

50


## 0.5000 0.2857

So these results indicate no di�erence between the proportions (see the p-values and suchin the output). What about a χ2 test? First, we have to turn our data into a matrix as thatis the required formatting (see the Arguments section under ?chisq.test) (also note thatbecause we are running a this using chisq.test the second column of the table has to bethe number of negative outcomes (failures: 10 & 15) as opposed to the totals (20 & 21) asgiven above. For a 2 × 2 table, the results using prop.test and chisq.test are equivalent.

chi.data <- matrix(c(10, 6, 10, 15), 2)

chi.data

## [,1] [,2]

## [1,] 10 10

## [2,] 6 15

chisq.test(chi.data)

##

## Pearson's Chi-squared test with Yates' continuity correction

##

## data: chi.data


We can also do r × c contingency tables. Consider the data from class.status above.

class.status

## 2006 2007


## Bobcat 0 1

## Feral hog 2 1

## Gray fox 4 2


## Common raven 0 3




chisq.test(class.status)

## Warning: Chi-squared approximation may be incorrect

##

## Pearson's Chi-squared test

##

## data: class.status


chisq.test(class.status)$expected


## 2006 2007

## Nine-banded Armadillo 0.4242 0.5758

51


## Bobcat 0.4242 0.5758

## Feral hog 1.2727 1.7273

## Gray fox 2.5455 3.4545

## Common raccoon 3.8182 5.1818

## Common raven 1.2727 1.7273

## Striped skunk 0.8485 1.1515

## Texas rat snake 0.4242 0.5758

## Total multiple predator events 2.9697 4.0303

Notice that chisq.test includes more information than is printed by default. Alwaysremember this about R, you can see what is included in the function using some of thetricks from earlier. For example, you can use str to determine what is included in functionchisq.test. But, if you want to know what is included in the information after callingtextttchisq.test on our data, then you could use str and extract the contents of your functioncall, then, it is simply a matter of identifying what you are interested in extracting, andpulling it from the list identi�ed above. For example, you can see that we have a listcontaining 8 di�erent objects, all identi�ed using the $ operator.

str(chisq.test)

## function (x, y = NULL, correct = TRUE, p = rep(1/length(x), length(x)),

## rescale.p = FALSE, simulate.p.value = FALSE, B = 2000)

str(chisq.test(class.status))


## List of 9

## $ statistic: Named num 11.4

## ..- attr(*, "names")= chr "X-squared"

## $ parameter: Named int 8

## ..- attr(*, "names")= chr "df"

## $ p.value : num 0.179

## $ method : chr "Pearson's Chi-squared test"

## $ data.name: chr "class.status"

## $ observed : num [1:9, 1:2] 0 0 2 4 2 0 2 1 3 1 ...

## ..- attr(*, "dimnames")=List of 2

## .. ..$ : chr [1:9] "Nine-banded Armadillo" "Bobcat" "Feral hog" "Gray fox" ...

## .. ..$ : chr [1:2] "2006" "2007"

## $ expected : num [1:9, 1:2] 0.424 0.424 1.273 2.545 3.818 ...



## .. ..$ : chr [1:2] "2006" "2007"

## $ residuals: num [1:9, 1:2] -0.651 -0.651 0.645 0.912 -0.93 ...



## .. ..$ : chr [1:2] "2006" "2007"

## $ stdres : num [1:9, 1:2] -0.872 -0.872 0.891 1.328 -1.438 ...



## .. ..$ : chr [1:2] "2006" "2007"

## - attr(*, "class")= chr "htest"

chisq.test(class.status)$observed

52



## 2006 2007


## Bobcat 0 1

## Feral hog 2 1

## Gray fox 4 2


## Common raven 0 3




chisq.test(class.status)$expected


## 2006 2007

## Nine-banded Armadillo 0.4242 0.5758

## Bobcat 0.4242 0.5758

## Feral hog 1.2727 1.7273

## Gray fox 2.5455 3.4545

## Common raccoon 3.8182 5.1818

## Common raven 1.2727 1.7273

## Striped skunk 0.8485 1.1515

## Texas rat snake 0.4242 0.5758

## Total multiple predator events 2.9697 4.0303

5.2 Linear Regression

The basics behind this section is to get you comfortable with general approaches to regres-sion analysis. The methods build on each other, but for the most part remain consistent.First, I will outline a simple linear regression with one response and one predictor variable,then discuss how this relates to analysis of variance. I will follow with multiple regression ≥2 predictor variables and generalized linear models for binary and count data. Linear regres-sion, the workhorse of statistical methodology, is used to explain the relationship between 2variables, primarily focused on how one variable impacts the level of another variable. Justbecause I want to see how to put a formula in LYX, here is the basic equation for linearregression,

yi = α + βxi + εi

You all have seen this, so we will not belabor the point. But, how do we do linearregression in R? We do it well...

lm(quaily$worms ~ quaily$mass)

##

## Call:

## lm(formula = quaily$worms ~ quaily$mass)

##

## Coefficients:

53


## (Intercept) quaily$mass

## -1.38 1.01

Doesn't seem like much when you do it like that, does it? I mean, R pretty much justshows us the function call, and the estimated beta coe�cients�is that all R did? Why are wehere again? Now, R does other things, but remember earlier when I said R would not giveyou things, you had to ask for them? Well, now its time to learn how to ask. First, youcan use the summary function to extract a little bit more information (you can ignore theuseFancyQuotes code, it is so I could output the summary in a pdf, something screwy withSweave and R). So, what did we get using summary? Well, our call to lm() created a modelobject (just as the chi-square test we used earlier did) consisting of several parts. First, wehave a repeat of the function call, then a summary of the distribution of the residuals, thenthe model coe�cients are printed, followed by some various information on model �t.

options(useFancyQuotes = FALSE)

summary(lm(quaily$worms ~ quaily$mass))

##

## Call:

## lm(formula = quaily$worms ~ quaily$mass)

##

## Residuals:

## Min 1Q Median 3Q Max

## -1.0903 -0.4342 -0.0487 0.3223 1.5277

##

## Coefficients:

## Estimate Std. Error t value Pr(>|t|)

## (Intercept) -1.38248 0.60507 -2.28 0.027 *

## quaily$mass 1.00987 0.00915 110.36 <2e-16 ***

## ---

## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

##

## Residual standard error: 0.578 on 48 degrees of freedom

## Multiple R-squared: 0.996,Adjusted R-squared: 0.996

## F-statistic: 1.22e+04 on 1 and 48 DF, p-value: <2e-16

Pretty cool, huh. Now, what if we just wanted to extract the coe�cients without all theother stu�? Remember earlier I told you we would be using some stu� later? Here goes:�rst you can use names to see what is contained within the model object.

example.regression <- lm(quaily$worms ~ quaily$mass)

names(example.regression)

## [1] "coefficients" "residuals" "effects" "rank"

## [5] "fitted.values" "assign" "qr" "df.residual"

## [9] "xlevels" "call" "terms" "model"

Notice there is one in there called coefficients, so we can probably get those out in acouple other ways,

54


example.regression$coefficients


## -1.382 1.010

coef(example.regression)


## -1.382 1.010

summary(example.regression)$coefficients


## (Intercept) -1.382 0.605072 -2.285 2.678e-02

## quaily$mass 1.010 0.009151 110.359 2.060e-59

str(summary(example.regression)$coefficients)

## num [1:2, 1:4] -1.38248 1.00987 0.60507 0.00915 -2.28482 ...

## - attr(*, "dimnames")=List of 2

## ..$ : chr [1:2] "(Intercept)" "quaily$mass"

## ..$ : chr [1:4] "Estimate" "Std. Error" "t value" "Pr(>|t|)"

summary(example.regression)$coefficients[2, ]


## 1.010e+00 9.151e-03 1.104e+02 2.060e-59

Now, while you saw this example earlier, it is probably worthwhile to redo it here to showhow you can also build plots based o� of your linear regression analysis simply and e�ciently.



90))

abline(example.regression, col = "blue", lwd = 2)

fit.quaily <- fitted(example.regression)

segments(mass, fit.quaily, mass, worms, col = "blue")

55


●

●●●

●

●

●●

●

●●●●

●

●

●●

●●

●

●

●

●●

●

●

●

●●

●●

●●

●

●●●

●

●●

●

●

●

●●

●

●

●

●●

40 50 60 70 80 90

40

50

60

70

80

90


Quaily Fatness

Qua

ily W

orm

s

Ok, so what if we wanted to see the values used for developing this plot, or the residuals(di�erence between observered and expected)?

fitted(example.regression)

## 1 2 3 4 5 6 7 8 9 10 11 12

## 61.04 53.05 54.39 54.65 64.74 49.53 71.68 68.94 56.23 66.51 67.76 65.24

## 13 14 15 16 17 18 19 20 21 22 23 24

## 64.55 84.20 75.64 63.67 66.66 52.94 55.72 62.35 67.64 73.74 52.34 56.78

## 25 26 27 28 29 30 31 32 33 34 35 36

## 62.06 73.07 61.26 68.76 71.33 67.13 68.29 79.19 79.77 47.94 65.83 62.05

## 37 38 39 40 41 42 43 44 45 46 47 48

## 60.47 71.42 78.45 80.69 73.37 67.80 45.33 57.34 59.45 65.97 52.09 78.48

## 49 50

## 64.46 67.01

resid(example.regression)

## 1 2 3 4 5 6 7

## -0.037812 -0.049711 -0.392843 0.354689 0.255952 -0.525251 -0.681881

## 8 9 10 11 12 13 14

## 0.064976 -0.230813 -0.511328 -0.763571 0.761113 -0.552172 -0.204315

## 15 16 17 18 19 20 21

## 0.359414 0.326418 -0.662809 1.061375 0.284223 -0.350648 -0.642386

## 22 23 24 25 26 27 28

## 0.257977 -0.342799 -0.776145 0.942216 -0.065408 0.740016 1.236654

## 29 30 31 32 33 34 35

## 0.671575 -0.127350 -0.288705 -0.185243 0.229031 0.060250 -0.834712

## 36 37 38 39 40 41 42

## -0.047686 1.527717 0.580686 -0.448035 0.310045 -0.368370 0.196034

## 43 44 45 46 47 48 49

## -0.334276 0.658326 0.547690 0.034004 -1.090331 -0.478331 -0.461284

## 50

## -0.006166

56


You can obviously extend your linear regression to multiple predictor values following thesame approach as above for main e�ects models.

multi.regression <- lm(quaily$worms ~ quaily$mass + quaily$long)

summary(multi.regression)

##

## Call:

## lm(formula = quaily$worms ~ quaily$mass + quaily$long)

##

## Residuals:


## -1.1364 -0.4598 -0.0564 0.3147 1.4977

##

## Coefficients:


## (Intercept) -1.0906 0.7499 -1.45 0.15

## quaily$mass 1.0016 0.0154 64.96 <2e-16 ***

## quaily$long 0.0071 0.0107 0.67 0.51

## ---

## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

##




multi2.regression <- lm(quaily$worms ~ quaily$mass * quaily$long)

summary(multi2.regression)

##

## Call:

## lm(formula = quaily$worms ~ quaily$mass * quaily$long)

##

## Residuals:


## -1.1329 -0.3818 -0.0712 0.3784 1.4549

##

## Coefficients:


## (Intercept) -2.169294 1.710012 -1.27 0.21

## quaily$mass 1.019177 0.029387 34.68 <2e-16 ***

## quaily$long 0.038525 0.045980 0.84 0.41

## quaily$mass:quaily$long -0.000491 0.000698 -0.70 0.49

## ---

## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

##




You can do lots with basic regression, see ?lm for more details.

5.3 Generalized Linear Models

GLM's are characterized by the use of a 'link' function which provides the relationship

57


between the predictor variables and the expected value response variable. Probably themost common GLM is logistic regression, but a GLM with a normal link function wouldgive the same results as using linear regression modeling with lm. in R, there is a nice littlefunction called glm for running generalized linear models. As an example, here is some birdcount data we can use for doing some logistic regression analysis on.

bird.data <- read.table("F:/BretResearch/Workshops/TxTWS_RWorkshop/birddata.txt",

header = TRUE, colClasses = c("numeric", "numeric", "factor", "numeric"))

str(bird.data)


## $ present: num 0 0 0 0 1 1 1 0 0 1 ...

## $ area : num 7.83 2.7 10.44 2.7 8.44 ...

## $ reg : Factor w/ 4 levels "5","6","7","8": 3 3 3 3 3 3 2 2 2 2 ...

## $ canopy : num 30.6 48.9 35.9 39.3 46.5 ...

head(bird.data)

## present area reg canopy

## 1 0 7.83 7 30.59

## 2 0 2.70 7 48.92

## 3 0 10.44 7 35.92

## 4 0 2.70 7 39.33

## 5 1 8.44 7 46.52

## 6 1 17.83 7 46.86

So, we have a simple dataset for some birds surveys on which presence or absence wasmeasured and we want to see if presence/absence is in�uenced by either the area of habitat(in hectares), the region of the state (factor variable with 4 levels), or the percentage ofcanopy cover (range from 0-100). Now, glm has a trick to it you have to remember, althoughif you do ?glm you would see it in the help �le. When you specify glm, you have to de�nea value for 'family', which tells R which link function from the exponential family to use torelate the predictors to the response variable. Since we are dealing with binary data, we willuse binomial to de�ne family.

bird.model <- glm(bird.data$present ~ bird.data$area, family = "binomial")

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

summary(bird.model)

##

## Call:

## glm(formula = bird.data$present ~ bird.data$area, family = "binomial")

##

## Deviance Residuals:


## -2.396 -1.266 0.867 1.027 1.108

##

## Coefficients:

## Estimate Std. Error z value Pr(>|z|)

## (Intercept) 0.15627 0.21467 0.73 0.467

## bird.data$area 0.00256 0.00126 2.03 0.042 *

58


## ---

## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

##

## (Dispersion parameter for binomial family taken to be 1)

##

## Null deviance: 201.89 on 153 degrees of freedom

## Residual deviance: 187.35 on 152 degrees of freedom

## AIC: 191.3

##

## Number of Fisher Scoring iterations: 8

Maybe we want to see how the predicted probability of presence changes over area. Look-ing at the summary above, we can see that the intercept 0.156270 and slope 0.002556 areboth positive, so we would expect a positive impact of area on presence. We can show thatseveral ways, but probably using a graphic would be best.

plot(bird.data$area, fitted(glm(bird.data$present ~ bird.data$area, family = "binomial")),

xlab = "Area (ha)", ylab = "Probability present")

●●●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●●

●●

●●●

●

●

●

●●●

●

●

●

●

●●

●●

●

●●●●

●

●

●

●●●●●●

●

●

●●●

●

●●●

●

●●●●●

●

●●

●

●

●●●●●●

●

●

●

●●

●

●●●

●

●

●

●

●

●

●

●●

●

●●●

●●

●

●

●

●

●●

●

●

●●●

●●●●●

●

●●

●

●

●

●

●

●

●●

●

●

●●●●

●

●●●

●

●●●●

●

●

●

●

0 4000 8000 12000

0.6

0.8

1.0

Area (ha)

Pro

babi

lity

pres

ent

But, this plot is really not that pretty, what with all the dots and stu�. What say wetry another way to clean it up a bit. First, we do a bit of data manipulation so that we canuse the predict function in R, which is a pretty useful little function. Now, if you look atthe above �gure you see that we are pretty much plotting the predicted response (presenceprobability) for each level of are for which we have data. But, what if we wanted to knowwhat the prediction looked like for area sizes we did not collect? Well, that is pretty simpleto do. First, we do a bit of data manipulation where we de�ne a new variable for area (Area)which ranges from 0 to 10,000 (it could have been any value, 100, 10,000, etc.) and then weuse predict to predict the estimate presence probability for each value of Area. I used headbelow to show the �rst 6 values of the predictions.

59


attach(bird.data)

bird.predict <- glm(present ~ area, family = "binomial")

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Area <- seq(0, 10000, 1)

new.area <- data.frame(area = Area)

predicted.bird <- predict(bird.predict, new.area, type = "resp")

head(predicted.bird)

## 1 2 3 4 5 6

## 0.5390 0.5396 0.5403 0.5409 0.5415 0.5422

plot(predicted.bird ~ Area, type = "l", las = 1, ylab = "Probability present")

0 2000 6000 10000

0.6

0.7

0.8

0.9

1.0

Area

Pro

babi

lity

pres

ent

Well, what do you know, a prettier graph. But, what would a journal editor say? �Whereare the con�dence limits?� This takes a bit of tweaking and there are several ways to dothis, but here is the one I tend to like. First, you have to do a little data manipulation usingpredict. First, notice that I changed the type to �link�, there is a reason. Above, when Iused type="resp" I was predicting on the 'real' scale, or the actual predicted probabilitiesfor each level of Area. But, when you build your con�dence intervals based on the real scalevalues, you can get estimates >1, which cannot happen. This is my crude hack that I useall the time to build con�dence intervals when I am working on logistic regression models.So, �rst, I bring in the non-logit transformed estimates for each value of Area, and I build acon�dence interval for each level.

pred.cl <- predict(bird.predict, new.area, interval = c("confidence"), level = 0.95,

type = "link", se.fit = TRUE)

uppercl <- pred.cl$fit + 1.96 * pred.cl$se.fit

lowercl <- pred.cl$fit - 1.96 * pred.cl$se.fit

Now, plotting these is pretty simple using the lines function. Note I am extending the

60


y-axis here so that I can add data to the graph showing the spread of the sites with detectionsor not (shown using the points statement below; in green on the graphic).

plot(predicted.bird ~ Area, type = "l", las = 1, ylab = "Probability present",

ylim = c(0, 1))

lines(plogis(uppercl), col = "blue")

lines(plogis(lowercl), col = "red")

points(area, present, col = "green")

0 2000 6000 10000

0.0

0.2

0.4

0.6

0.8

1.0

Area

Pro

babi

lity

pres

ent

●●●●

●●●

●●

●●

●

●

●

●●● ●●

●

●● ●●●●●

●

●

●

●

●

●●●

●

●●●●●●●●●●●●

●

●

●

●●

●●●●

●● ●●

●●

●●●●●●

●

●●●●

●●●

●●●

●●●●●

●

●●●

●

●●●●

●●●●● ●●●●●

●

●●●● ●●●

●

● ●

●

●●

●

●●●●

●●●●

● ●●● ●

●

●

●●●●●

●●

●●

●

●

●

●●

●●● ●● ●

In case your wondering, the reason the con�dence intervals are not symmetric around theline, like you are probably used to, is because they are build on the logit scale.

But, wait a minute, what the heck is plogis()? It must be important, right? Yes, it isas it keeps your values bounded between 0 and 1 (see ?plogis) and its really useful. First,remember that a basic logistic regression looks like this:

eβ0+βixi

1 + eβ0+βixi

so, if we have estimates for β0 and β1 then we can actually use plogis to predict eachprobability. For example,

coef(bird.predict)

## (Intercept) area

## 0.156270 0.002556

then we have a estimate for the intercept and the slope. Say our interest was in predict-ing the probability of presence given a area estimate of 10. Well, using the above logisticregression formula, it would look like

e0.156270254+0.002555695∗10

1 + e0.15270254+0.002555695∗10 = 0.545332

61


or, we can get at this a couple way,

plogis(0.156270254 + 0.002555695 * 10)

## [1] 0.5453

plogis(summary(bird.predict)$coefficients[1] + summary(bird.predict)$coefficients[2] *

10)

## [1] 0.5453

Say we wanted predictions for area estimates of 25:50?

plogis(summary(bird.predict)$coefficients[1] + summary(bird.predict)$coefficients[2] *

25:50)

## [1] 0.5548 0.5555 0.5561 0.5567 0.5573 0.5580 0.5586 0.5592 0.5599 0.5605

## [11] 0.5611 0.5618 0.5624 0.5630 0.5636 0.5643 0.5649 0.5655 0.5662 0.5668

## [21] 0.5674 0.5680 0.5687 0.5693 0.5699 0.5705

I �nd that plogis() is a generous friend and I use it every day!

62


6 Writing Functions in R

6.1 Functions

Sometimes, One of the basics of R is that users can contribute code to conduct variousanalyses. In R, the standard contribution is a function, or something that the end user useson their data to get some result. For instance, consider the simple function below to add 2user-supplied values together:

addTwo = function(a, b) {

out = a + b

return(out)

}

addTwo(2, 2)

## [1] 4

addTwo(2, 4)

## [1] 6

This function works with any 2 numerical values. Functions can be more complex, likethe below function that creates summary output:

my.summary = function(x) {

my.n = length(x)

my.mean = mean(x, na.rm = TRUE)

my.var = var(x, na.rm = TRUE)

my.sd = sd(x, na.rm = TRUE)

my.median = median(x, na.rm = TRUE)

out = list(SampleSize = my.n, Mean = my.mean, Variance = my.var, StdDev = my.sd,

Median = my.median)

return(out)

}

sum.data = rnorm(10)

my.summary(sum.data)

## $SampleSize

## [1] 10

##

## $Mean

## [1] 0.01348

##

## $Variance

## [1] 0.3399

##

## $StdDev

## [1] 0.583

##

## $Median

## [1] -0.05622

Functions can do a ton of work for you, so I am barely (and I mean that, barely) scratchingthe surface. If, for instance, you wanted to see what the function bbmm.polygon() from the

63


moveud package looks like (and I know exactly what it looks like cause I wrote it, butbbmm.polygon creates the utilization distribution contours based on the bbmm.contour frompackage BBMM and exports the created contour lines as polygon shape�le for further analysisin ArcMap (or GIS program of choice), then you can just type in the name of the functioninto R and out it pops. E�ectively, it imports a dataframe, reprojects it to UTM, usesbrownian.bridge() to create a BBMM and exports the contour lines via bbmm.contour,creates a raster, transforms that raster to a spatial polygon data frame, adds a couple ofvariables to the data frame, and writes the output to a shape�le appropriate for reading intoArcMap. Not to complicated...

bbmm.polygon

## function (x, crs.current, crs.utm, lev, plot = FALSE, path, indID)

## {

## coordinates(x) = ~Lon + Lat

## proj4string(x) = CRS(crs.current)

## x = data.frame(spTransform(x, CRS(crs.utm)))

## out.bbmm = brownian.bridge(x = x$Lon, y = x$Lat, time.lag = x$tl[-1],

## location.error = 15, cell.size = 20, max.lag = 180)

## contours = bbmm.contour(out.bbmm, levels = lev, locations = x,

## plot = plot)

## probs <- data.frame(x = out.bbmm$x, y = out.bbmm$y, z = out.bbmm$probability)

## out.raster <- rasterFromXYZ(probs, crs = CRS(crs.utm), digits = 5)

## raster.contour <- rasterToContour(out.raster, levels = contours$Z)

## raster.contour <- spChFIDs(raster.contour, paste(lev, "% Contour Line",

## sep = ""))

## out = spTransform(raster.contour, CRS(crs.utm))

## out = SpatialLines2PolySet(out)

## out = PolySet2SpatialPolygons(out)

## out = as(out, "SpatialPolygonsDataFrame")

## out$UDlevels = paste(rev(lev))

## out$BandID = paste(indID)

## setwd(path)

## writeOGR(obj = out, dsn = ".", layer = paste(indID), driver = "ESRI Shapefile")

## }

## <environment: namespace:moveud>

Or, if for instance, you want to do a little simulation to look at the impacts of detectionheterogeneity in deer spotlight survey count data and see how many times you would expectto over estimate, underestimate, or be correct (within 10% error bounds), of how many deerwere near the road you were driving down (even though this is fraught with errors) then youcould use:

deer.sim = function(survey, reps) {

x = replicate(reps, {

pr = rnorm(survey, 0.413692, 0.12322998)

pr[pr < 0] = 0

x = survey/pr

lower = x[x < mean(x) - 0.1 * mean(x)]

upper = x[x > mean(x) + 0.1 * mean(x)]

c(length(lower), length(upper))/length(x)

})

64


ml = mean(x[1, ])

mu = mean(x[2, ])

constant = 1 - ml - mu

x = cbind(Decreased = ml, Constant = constant, Increased = mu, Count = survey)

return(x)

}

And then we can use the function call to estimate how many times we might be too low,too high, or just right based on those numbers (although some would argue about it anywaybecause deer spotlight surveys are sacrosanct and inviolable in their eyes...).

deer.sim(100, 100)

## Decreased Constant Increased Count

## [1,] 0.5429 0.2129 0.2442 100

That about does it for functions, we could spend time on lexical scoping and such, butthat is way beyond this class...

65


7 Wildlife-Speci�c Methods

7.1 Capture-Recapture Analysis

The most comprehensive software package for analysis of capture-recapture data is the pro-gram MARK (White and Burnham 1999). While it is unparalleled in the range of models,quality of the user documentation (http://www.phidot.org/software/mark/docs/book/),and active base of user-driven support (http://www.phidot.org/forum/index), the inter-face for building models can be limiting for large data sets and complex models. While thereis some capability for automatic model creation in MARK, most models are built manually witha graphical user interface to specify the parameter structures and design matrices. Manualmodel creation can be useful during the learning process but eventually it becomes a time-consuming and sometimes frustrating exercise that may add an unnecessary source of errorin the analysis. Finally, for those that analyze data from on-going monitoring programs,there is no way to extend the capture-history in MARK, which necessitaes manual recreationof all models as data from future sampling occasions is collected.

RMark is a R package that provides a formula based interface for MARK. RMark has beenavailable since 2005 and is on the Contributed R Archive Network (CRAN) (http://cran.r-project.org). RMark contains functionality to build models for MARK from formulas, runthe model with MARK, extract the output, and summarize and display the results with auto-matic labeling. RMark also has functions for model averaging, prediction, variance compo-nents, and exporting models back to the MARK interface. In addition, all of the tools in R areavailable which enable a completely scripted analysis from data to results and inclusion intoa document with Sweave (Leisch 2002) and LATEX to create a reproducible manuscript suchas this one. The report which represents the appropriate citation (e�ective 2013) for RMARKcan be found at http://www.afsc.noaa.gov/Publications/ProcRpt/PR2013-01.pdf andis included in the workshop notes as well. I have not included the

Here we are going to provide an overview of the RMark package and how it can be used tobene�t MARK users. For more detailed documentation, refer to the online documentation athttp://www.phidot.org/software/mark/rmark/and the help within the RMark package.And, just to be fair, a sign�ciant portion of these course notes came from various documentsJe� created while explaining or documenting RMark for teaching purposes and to a lesserextent from some notes I have put together for students at A&M.

Background

RMark does not �t models to data, rather, RMark is a R package that was designed toprovide a alternative user interface to MARK and its GUI. RMark uses the R language toconstruct models, create the input �le (.inp), then call MARK which �ts the model(s) tothe data, extracts the results from the output �le created by MARK, and allows the user tomanipulate (via R or some other program) the resultant model output. Thus, RMark is a Rinterface to MARK, not a stand along capture-recapture modeling environment. That said, ifresults you got using MARK do not match the results you got when you used RMark, then youhave made a mistake in one or the other.

Where to find help?

66


Currently, or at least as best we can tell, MARK supports ≥ 140 di�erent modeling options.At present, RMark does not fully replicate every option available in MARK. Although newmodels are added to RMark fairly regularly, not every model in MARK is available in RMark, andsome things you can do in MARK such as data bootstrapping or computing median c-hat valuesare not available through the RMark interface. For a list of models available in RMark, youcan use something like system.file("MarkModels.pdf", package="RMark") which willprovide you with a PATH statement telling you where you can access the pdf �le containingthe list of MARK models available in RMark, along with the appropriate code, parameter, andhelp �le names (or, if you have a speci�c R_LIBS path where you R packages are installedlocally, just go there and look for the RMark and the �le MarkModels.pdf will be found there.First, it is important to remember that RMark needs MARK, so without an understanding ofMARK, you will be limited in your ability to use RMark. So, your �rst stop should always be thethe "MARKBOOK", authored/edited by Evan Cooch and Gary White, with contributions froma wide variety of others. The MARKBOOK is freely available (all 1000+ pages of it) at http://www.phidot.org/software/mark/docs/book/. Unequivocally, this is the primary deskreference for capture-recapture modeling approaches supported by MARK (although you shouldnever cite it in a manuscript; see MARK FAQ at http://www.phidot.org/forum/index.php).Details on RMark are found in Appendix C. Additionally, there is a very active community ofecologists who use MARK regularly that are willing to provide expertise to folks across a widevariety of capture-recapture modeling techniques, and a online forum (managed by EvanCooch) is available at http://www.phidot.org/forum/index.php. The user group of thephidot.org forum is typically extremely helpful, given you have read the MARKBOOK and havesearched the archives. If you are not already a member, sign up. Finally, RMark operates justlike any other R package, if you need the help/reference �les for a particular function withinRMark, you can access that function using the �?� followed by the name of the function youare interested in (e.g., ?mark).

Advantages/Disadvantages

So, why would one want to use RMark as an interface to MARK rather than MARK's GUI?Reasons abound, some are valid, some are not, lots of it is just individual point of view orproject-speci�c needs. We think that there are some convincing reasons to use a scriptedapproach for your MARK analysis, but in the end it becomes a personal choice (one I think itis obvious that Je� and I have already made). A few of the primary reasons we like to useRMark are (but not limited to):

1. RMark provides the user with the ability to automate analysis of monitoring data setseven as monitoring occassions are added. This is a signi�cant bene�t that RMark bringsto MARK users as script generation of PIM and DM allow you to create the script onceand if as monitoring data are collected, typically no changes to the script are needed.You just re-run the script with the new data�le.

2. Design matrix creation. RMark uses a formula-based approach, which is faster and typ-ically less error-prone (although not entirely error prone). Thus, less need to manuallycreate the PIMS or DM. But, understanding of what the DM should look like is stillnecessary.

67


3. PIM simpli�cation. RMark automatically creates the simplest PIM structure for eachmodel, as opposed to MARK which uses the full DM even when reduced models arecreated. This will speed up model evaluation.

4. Collaborative Development: MARK and RMark play well together, so you can moveanalyses back and forth fairly cleanly using functions such as export.MARK() andconvert.inp().

5. Entire analyses can be scripted. Although this is related to No.1 above, the scriptingof analyses and the ability to use some of the functionality that comes along with R foradditional computational support, publication quality graphing, among other things isquite bene�cial.

6. Reproducible analysis and documentation. Nearly all MARK analyses are reproducibleso long as one keeps the .inp/.dbf/.fpt �les and documents what was done. One thingthat RMark excels at is that documentation support capabilities for R are widely ap-plicable for MARK analyses. Thus, complete data sets and analysis, with metadata anddetailed documentation, can be developed as R packages or data/code can be seemlesslyintegrated into LATEX style manuscripts and documents (although Evan does a prettygood job with the MARKBOOK). We �nd it really useful that the entirely of a datasetand analysis can be documented cleanly in one place (see ?dipper for an example).Obviously, good data management protocols for reproducible analyses using only MARKare equally good, so this is more of a personal preference.

Ok, so lets jump in with a quick example. As with most R packages, to access the functionalityin RMark you type library(RMark) and R will respond with its appropriate version numberand relevant information (I have it in .RPro�le on my system, so no output will be showbelow when I do it). For a quick example, we will use the ubiquitos European dipper(Cinclus cinclus) capture-recapture data from many examples in the MARKBOOK and a varietyof manuscripts (it is included as a data�le in RMark). For the dipper example, if we look atthe structure of the dataset, we can see that it is a dataframe with 2 �elds. The �rst �eldis the encounter history, which has a required column heading name of 'ch' and must be acharacter (chr) variable. The �eld label ch is required for all MARK analyses, and typicallya �eld identifying the number of individuals with that speci�c encounter history (denoted'freq') is included, along with additional �elds are all optional. In this example, the �eld�sex� speci�es group structure (e.g., whether an individual is male or female) and is identi�edas a factor variable (Factor) with values 1=Female and 2=Male as ordering is alphabetic andignores the ordering of the columns in the dipper.inp �le which we can see using levels().Finally, we can run a simple CJS analysis using the default of constant survival and constantrecapture probabilities for the dipper data using the simple code mark(dipper).

library(RMark)

## This is RMark 2.1.7

data(dipper)

str(dipper)


## $ ch : chr "0000001" "0000001" "0000001" "0000001" ...

## $ sex: Factor w/ 2 levels "Female","Male": 1 1 1 1 1 1 1 1 1 1 ...

68


levels(dipper$sex)

## [1] "Female" "Male"

ex = mark(dipper)

##

## Output summary for CJS model

## Name : Phi(~1)p(~1)

##

## Npar : 2

## -2lnL: 666.8

## AICc : 670.9

##

## Beta

## estimate se lcl ucl

## Phi:(Intercept) 0.2421 0.1020 0.0422 0.4421

## p:(Intercept) 2.2263 0.3251 1.5891 2.8635

##

##

## Real Parameter Phi

##

## 1 2 3 4 5 6

## 1 0.5602 0.5602 0.5602 0.5602 0.5602 0.5602

## 2 0.5602 0.5602 0.5602 0.5602 0.5602

## 3 0.5602 0.5602 0.5602 0.5602

## 4 0.5602 0.5602 0.5602

## 5 0.5602 0.5602

## 6 0.5602

##

##

## Real Parameter p

##

## 2 3 4 5 6 7

## 1 0.9026 0.9026 0.9026 0.9026 0.9026 0.9026

## 2 0.9026 0.9026 0.9026 0.9026 0.9026

## 3 0.9026 0.9026 0.9026 0.9026

## 4 0.9026 0.9026 0.9026

## 5 0.9026 0.9026

## 6 0.9026

Importing and Manipulating Data

Now that we have RMark up and running (and we know that it works), the �rst thing weall want to do it load our data and do some analysis! RMark has several options/ways for oneto create or load data for analysis in MARK. As most are familiar with the �le.inp structureused by MARK, lets start with the approach that converts a encounter history inp �le to adataframe for use in RMark. For this demonstration, we will use the dipper.inp �le which onmy 64bit system is located in- �C:\Program Files (x86)\MARK\Examples� and the RMarkfunction convert.inp(). Conversion of a .inp �le to a dataframe using convert.inp()requires that that we specify the input �le location and name, group and optional covariatenames, and if the .inp �le has commented areas (/* and */ in MARK parlance), that we let

69


RMark know. So you don't have to go look (or you can look above), the structure of dipper.inpis pretty straightforward, the encounter history has 7 encounter occasions, does include thefreq column given the number of individuals with each speci�c encounter history, and has 2groups (columns) representing either Male or Female (1 or 0). Because Males are in the �rstcolumn and females are in the second column, when we de�ne group.df= that will be theorder we use. So, converting the dipper.inp data would work as follows:

dipper.convert = convert.inp("C:/Program Files (x86)/MARK/Examples/dipper.inp",

group.df = data.frame(sex = c("Male", "Female")))

When we look at the structure of the newly created �le dipper.convert, we will see thatit is now a R dataframe with 3 �elds. The �rst �eld is the capture history (ch) which is acharacter values, the second �eld is the frequency variable (freq) or the number of individualswith that unique encounter history (a numeric value), and the third �eld is the groupingvariable sex, which is a factor variable with 2 levels and can be shown using levels().

str(dipper.convert)


## $ ch : chr "1111110" "1111000" "1100000" "1100000" ...

## $ freq: num 1 1 1 1 1 1 1 1 1 1 ...

## $ sex : Factor w/ 2 levels "Female","Male": 2 2 2 2 2 2 2 2 2 2 ...

levels(dipper.convert$sex)

## [1] "Female" "Male"

Once your data is in R as a dataframe, there are some handy options for manipulating datathat you can do using standard R functions. A simple example is to add a numeric columnrepresenting some covariate (weight in typically used) to the newly created dipper.convertdataframe.

dipper.convert$weight = rnorm(nrow(dipper.convert), mean = 11, sd = 3)

summary(dipper.convert$weight)

## Min. 1st Qu. Median Mean 3rd Qu. Max.

## 1.21 9.46 11.10 11.30 13.50 19.30

Processing Data

Many of you will be familiar with the MARK model speci�cation window as it is where youidentify the dataset you want to use for analysis, choose the model type speci�c for youranalysis as well as providing details on the various descriptors for your dataset such as thenumber of encounter occasions, name and number of groups and individual covariates.

70


RMark (read: Je� when he wrote it) takes care of some of these speci�cations such asnumber of occasions, group labels and individual covariate names (drawn from the input�le column names) by setting these for you. However, some of the options such as titles,number of mixtures, time intervals, among others are all argument options for the func-tion process.data(), which takes the place of the model speci�cation window from MARK.process.data() does exactly what it sounds like, it processes the speci�ed input data �le,and creates a R list structure that include the original dataframe, all the required attributedata, and what model the dataset should be analyzed with:

dipper.proc = process.data(dipper.convert, model = "CJS", groups = "sex", begin.time = 1980)

str(dipper.proc)

## List of 15

## $ data :'data.frame': 294 obs. of 5 variables:

## ..$ ch : chr [1:294] "1111110" "1111000" "1100000" "1100000" ...

## ..$ freq : num [1:294] 1 1 1 1 1 1 1 1 1 1 ...

## ..$ sex : Factor w/ 2 levels "Female","Male": 2 2 2 2 2 2 2 2 2 2 ...

## ..$ weight: num [1:294] 12.78 9.98 8.12 12.78 11.34 ...

## ..$ group : Factor w/ 2 levels "1","2": 2 2 2 2 2 2 2 2 2 2 ...

## $ model : chr "CJS"

## $ mixtures : num 1

## $ freq :'data.frame': 294 obs. of 2 variables:

## ..$ sexFemale: num [1:294] 0 0 0 0 0 0 0 0 0 0 ...

## ..$ sexMale : num [1:294] 1 1 1 1 1 1 1 1 1 1 ...

## $ nocc : num 7

## $ nocc.secondary : NULL

## $ time.intervals : num [1:6] 1 1 1 1 1 1

## $ begin.time : num 1980

## $ age.unit : num 1

## $ initial.ages : num [1:2] 0 0

71


## $ group.covariates:'data.frame': 2 obs. of 1 variable:

## ..$ sex: Factor w/ 2 levels "Female","Male": 1 2

## $ nstrata : num 1

## $ strata.labels : NULL

## $ counts : NULL

## $ reverse : logi FALSE

So, we can see above that the processed data now consists of a list with di�erent elementsthat include the capture-recapture data as a dataframe, de�nes what model we are using, howmany encounter occassions there are, what year each data point is collected, and a host ofother information that is either 1) not require for the particular model (e.g., strata.label=or 2) is added to the data by RMark (e.g., age.unit). We can look at speci�c values of theprocessed dataset, for instance, look at the �rst 10 records of the dataset, or determine howmany encounter occasions we have:

dipper.proc$data[1:10, ]

## ch freq sex weight group

## 1:1 1111110 1 Male 12.781 2

## 1:3 1111000 1 Male 9.982 2

## 1:6 1100000 1 Male 8.122 2

## 1:7 1100000 1 Male 12.782 2

## 1:8 1100000 1 Male 11.343 2

## 1:9 1100000 1 Male 15.302 2

## 1:12 1010000 1 Male 10.737 2

## 1:14 1000000 1 Male 14.024 2

## 1:15 1000000 1 Male 9.855 2

## 1:16 1000000 1 Male 13.805 2

dipper.proc$nocc

## [1] 7

In general, once the data has been processed, not much else needs to be done with it.The primary exceptions would be if there are changes to the original dataframe, such asaddition of new individuals or a new encounter history, or perhaps strata are added orgrouped di�erently.

Design Data

Design data is likely an unfamiliar concept for users of MARK. However, design data un-derlies how data are to be associated with the various parameters that are estimated byMARK thus understanding the intricacies of how design data are created by RMark, how theyequate to values in MARK, and how to create or manipulation design data is one of the mostimportant aspects of using RMark. Thus, we are going to spend a good bit of time focused ondetailing design data and how it transfers to MARK in the form of a PIM so the link betweenwill be clearer.

Underlying design data creating in RMark are several R functions, with the primary beingmake.design.data, but other important ones that will regularly be used being add.design.data

72


and merge_design.covariates. In addition, basic R data manipulation methods can beused to add, remove, create or manipulate design data in RMark. We will get to all of thisshortly, but �rst we need to outline what design data is and how it links to parameters inMARK.

First, lets turn the RMark processed data frame dipper.proc into design data using make.design.data.The results from make.design.data creates a R list with list headings being the speci�c pa-rameters (Phi, p, Psi, etc.) used in MARK by the model chosen in process.data. Note thatwe have adopted Je�'s naming nomenclature here and named the new object dipper.ddl,where the .ddl stands for design data list as the object is design data, and is a list. For ourdipper example, the top-level headings of the resultant data list are shown using names andas expected are Phi and p (the 2 parameters used by a CJS model in MARK).

dipper.ddl = make.design.data(dipper.proc)

names(dipper.ddl)

## [1] "Phi" "p" "pimtypes"

However, there is another list heading, pimtypes, that is likely unexpected by everyone.pimtypes, de�nes what type of PIM is to be used as the base, either all (all PIM valuesdi�erent), time (values in each PIM are speci�c to columns), or constant (all PIM valuesthe same). For instance, when the default of 'all' is used in make.design.data, then thePIM for the parameters of Phi and p are all di�erent. So, using a little code trick so thatwe can see the PIMS without actually running the MARK, we can use the PIMS function tosee what the all di�erent PIMS look like for our dipper example:

PIMS(mark(dipper.proc, dipper.ddl, invisible = FALSE, run = FALSE), "Phi", simplified = F)

## group = sexFemale

## 1980 1981 1982 1983 1984 1985

## 1980 1 2 3 4 5 6

## 1981 7 8 9 10 11

## 1982 12 13 14 15

## 1983 16 17 18

## 1984 19 20

## 1985 21

## group = sexMale

## 1980 1981 1982 1983 1984 1985

## 1980 22 23 24 25 26 27

## 1981 28 29 30 31 32

## 1982 33 34 35 36

## 1983 37 38 39

## 1984 40 41

## 1985 42

Thus, we can see that for this case, we have the 'all di�erent' PIMS structure, de�nedby the grouping (factor) variable sex, for the parameter of interest, Phi (although the samecould be done for the p parameter as well). Note that the row and column labels are labeledbased on the date we provided in process.data) and the rows represent the cohort whichis the time of year they were �rst captured while the columns are the initial date for the

73


beginning of the survival period. Thus, parameter 14 for survival for the period 1984-1985of female dippers captured for the �rst time in 1982.

When using make.design.data on a processed data frame, RMark, for those parameterswith the default of triangular PIMS, by default creates design data representing time, age,and cohort, in addition to any grouping (factor) variables that were de�ned in process.data.So, sticking with our example PIM from the dipper dataset, lets look at the structure of thePhi portion of the R list dipper.ddl using:

str(dipper.ddl$Phi)


## $ par.index : int 1 2 3 4 5 6 7 8 9 10 ...

## $ model.index: num 1 2 3 4 5 6 7 8 9 10 ...

## $ group : Factor w/ 2 levels "Female","Male": 1 1 1 1 1 1 1 1 1 1 ...

## $ cohort : Factor w/ 6 levels "1980","1981",..: 1 1 1 1 1 1 2 2 2 2 ...

## $ age : Factor w/ 6 levels "0","1","2","3",..: 1 2 3 4 5 6 1 2 3 4 ...

## $ time : Factor w/ 6 levels "1980","1981",..: 1 2 3 4 5 6 2 3 4 5 ...

## $ occ.cohort : num 1 1 1 1 1 1 2 2 2 2 ...

## $ Cohort : num 0 0 0 0 0 0 1 1 1 1 ...

## $ Age : num 0 1 2 3 4 5 0 1 2 3 ...

## $ Time : num 0 1 2 3 4 5 1 2 3 4 ...

## $ sex : Factor w/ 2 levels "Female","Male": 1 1 1 1 1 1 1 1 1 1 ...

First, you can see that the portion we extracted using dipper.ddl$Phi is a data framehaving 11 variables with 42 observations. For CJS, the dataframe for Phi contains 42 records:21 for females and 21 for males. There are �ve factor variables, group, sex, cohort, age andtime and there are 5 numeric variables Cohort, Time and Age which are continuous versionsof the respective factor variables with the lower case letter, model.index which is a uniqueindex across all parameters as is par.index which is an integer, and occ.cohort which is acontinuous numeric value equivalent to cohort with an start value of 1. The group variableis a composite of the values of all factor variables used to de�ne the groups (sex in thisexample) but when there is a single factor variable as in this case it is redundant. So, letshave a look at the design data for Phi (note that I removed the values for model.indexand the variable sex which I used as a check to make sure everything lined up but is notneeded as groups represents the same value in this example). Looking at the design datafor the female group of the dipper data, we can see that the par.index values match upexactly with the PIM values for the fully time-dependent model we looked at earlier (if wedig down into the MARK output a little bit), providing a simple link between each showing usthe relationship between the PIM and the design data for the �rst group (female) that wespeci�ed in our process.data call.

dipper.ddl$Phi[(1:21), -c(2, 11)]

## par.index group cohort age time occ.cohort Cohort Age Time

## 1 1 Female 1980 0 1980 1 0 0 0

## 2 2 Female 1980 1 1981 1 0 1 1

## 3 3 Female 1980 2 1982 1 0 2 2

## 4 4 Female 1980 3 1983 1 0 3 3

## 5 5 Female 1980 4 1984 1 0 4 4

## 6 6 Female 1980 5 1985 1 0 5 5

74


## 7 7 Female 1981 0 1981 2 1 0 1

## 8 8 Female 1981 1 1982 2 1 1 2

## 9 9 Female 1981 2 1983 2 1 2 3

## 10 10 Female 1981 3 1984 2 1 3 4

## 11 11 Female 1981 4 1985 2 1 4 5

## 12 12 Female 1982 0 1982 3 2 0 2

## 13 13 Female 1982 1 1983 3 2 1 3

## 14 14 Female 1982 2 1984 3 2 2 4

## 15 15 Female 1982 3 1985 3 2 3 5

## 16 16 Female 1983 0 1983 4 3 0 3

## 17 17 Female 1983 1 1984 4 3 1 4

## 18 18 Female 1983 2 1985 4 3 2 5

## 19 19 Female 1984 0 1984 5 4 0 4

## 20 20 Female 1984 1 1985 5 4 1 5

## 21 21 Female 1985 0 1985 6 5 0 5

out = mark(dipper.proc, dipper.ddl, invisible = FALSE, output = FALSE)

out$pims$Phi[[1]]

## $pim

## [,1] [,2] [,3] [,4] [,5] [,6]

## [1,] 1 2 3 4 5 6

## [2,] 0 7 8 9 10 11

## [3,] 0 0 12 13 14 15

## [4,] 0 0 0 16 17 18

## [5,] 0 0 0 0 19 20

## [6,] 0 0 0 0 0 21

##

## $group

## [1] 1

Thus, cohort represents the row, time the columns, age is along the diagonal, and group isspeci�ed by the collection of parameters in a PIM. Based on that knowledge, we can easilyrelate the design data to the PIM values,. For example, knowing that par.index indexesthe values in the PIM, consider the design data columns 'cohort' and 'Cohort', for cohort,the �rst 6 rows of the design data represent the 1980 release cohort, or all those individualswho were released in 1980 and recaptured in subsequent capture occasions. Cohort (capitalC), provides the same information via an numeric, as opposed to factor, representation.Thus, for the release cohort of 1985, because recapture could only occur in 1996, we canonly estimate the survival parameter for the period 1985-1986, so there is a single value forcohort and Cohort, both of which relate back to the single value for par.index (21). Forthose that are interested, the above was just shown as an example, you can use the PIMSfunction to look at the output in a prettier format (which is preferable to some) that hasthe appropriate row and column labels and are labeled by group as we saw earlier.

PIMS(mark(dipper.proc, dipper.ddl, invisible = FALSE, run = FALSE), "Phi", simplified = F)


## 1980 1981 1982 1983 1984 1985

## 1980 1 2 3 4 5 6

## 1981 7 8 9 10 11

## 1982 12 13 14 15

75


## 1983 16 17 18

## 1984 19 20

## 1985 21

## group = sexMale

## 1980 1981 1982 1983 1984 1985

## 1980 22 23 24 25 26 27

## 1981 28 29 30 31 32

## 1982 33 34 35 36

## 1983 37 38 39

## 1984 40 41

## 1985 42

Obviously, if we wanted to see the PIM structure for the recapture parameter p, then itis equally simple. Just to relate it to the above, you can se that the recapture parametersare +1 sampling occasion from the survival parameters, exactly the same as they should be.Important note, the time and age parameters for the interval parameters (Phi) are labeledbased on the time or age value at the beginning of the interval, while the occasion parameters(p) are labeled by the occasion time or age. So, there will be a little bit of di�erence in thedesign data between Phi and p in our example.

PIMS(mark(dipper.proc, dipper.ddl, invisible = FALSE, run = FALSE), "p", simplified = F)


## 1981 1982 1983 1984 1985 1986

## 1980 43 44 45 46 47 48

## 1981 49 50 51 52 53

## 1982 54 55 56 57

## 1983 58 59 60

## 1984 61 62

## 1985 63

## group = sexMale

## 1981 1982 1983 1984 1985 1986

## 1980 64 65 66 67 68 69

## 1981 70 71 72 73 74

## 1982 75 76 77 78

## 1983 79 80 81

## 1984 82 83

## 1985 84

Given the default structure of the design data, one can specify a signi�cant number ofexplanatory models. However, we are not restricted to just using the available structure ofthe default design data for model development and analysis. RMark via the R interface allowsthe user to add any number of �elds to the design data, such as bins based on age, time orcohort, that the user can use to constrain parameters to be the same within a speci�c bin.New variables can be de�ned, and new data can be merged into the design data easily ande�ciently. There is signi�cantly more detail on manipulating design data in Appendix C ofthe MARKBOOK, but for brevity we will just show a couple of quick examples here. Forinstance, lets use the ubiquitous dipper example of adding �ood years to the design data forthe 2 periods (1981- 1982 and 1982-1983 intervals) for the apparent survival parameter Phi.

76


dipper.ddl$Phi$Flood = 0

dipper.ddl$Phi$Flood[dipper.ddl$Phi$time == 1981 | dipper.ddl$Phi$time == 1982] = 1

dipper.ddl$Phi[(1:21), -c(2, 11)]

## par.index group cohort age time occ.cohort Cohort Age Time Flood

## 1 1 Female 1980 0 1980 1 0 0 0 0

## 2 2 Female 1980 1 1981 1 0 1 1 1

## 3 3 Female 1980 2 1982 1 0 2 2 1

## 4 4 Female 1980 3 1983 1 0 3 3 0

## 5 5 Female 1980 4 1984 1 0 4 4 0

## 6 6 Female 1980 5 1985 1 0 5 5 0

## 7 7 Female 1981 0 1981 2 1 0 1 1

## 8 8 Female 1981 1 1982 2 1 1 2 1

## 9 9 Female 1981 2 1983 2 1 2 3 0

## 10 10 Female 1981 3 1984 2 1 3 4 0

## 11 11 Female 1981 4 1985 2 1 4 5 0

## 12 12 Female 1982 0 1982 3 2 0 2 1

## 13 13 Female 1982 1 1983 3 2 1 3 0

## 14 14 Female 1982 2 1984 3 2 2 4 0

## 15 15 Female 1982 3 1985 3 2 3 5 0

## 16 16 Female 1983 0 1983 4 3 0 3 0

## 17 17 Female 1983 1 1984 4 3 1 4 0

## 18 18 Female 1983 2 1985 4 3 2 5 0

## 19 19 Female 1984 0 1984 5 4 0 4 0

## 20 20 Female 1984 1 1985 5 4 1 5 0

## 21 21 Female 1985 0 1985 6 5 0 5 0

In addition, we can also check what the PIM looks like for the model that uses our newlycreated Flood parameter by de�ning a new formula (more on that soon) and looking at thePIMS:

PIMS(mark(dipper.proc, dipper.ddl, model.parameters = list(p = list(formula = ~time),

Phi = list(formula = ~Flood)), invisible = FALSE, run = FALSE), "Phi", simplified = TRUE)


## 1980 1981 1982 1983 1984 1985

## 1980 1 2 2 1 1 1

## 1981 2 2 1 1 1

## 1982 2 1 1 1

## 1983 1 1 1

## 1984 1 1

## 1985 1

## group = sexMale

## 1980 1981 1982 1983 1984 1985

## 1980 1 2 2 1 1 1

## 1981 2 2 1 1 1

## 1982 2 1 1 1

## 1983 1 1 1

## 1984 1 1

## 1985 1

We can also use some handy RMark functions to manipulate and create new design data.For instance, lets say that we want to create age intervals for survival (young, sub-adult,

77


adult), we can use add.design.data and create the new design data. Also, note the use ofright=FALSE and replace=TRUE, which are used to de�ne what is included in, or excludedfrom, the interval. And, just for consistency we wil output the PIM structure as well.

dipper.ddl = add.design.data(dipper.proc, dipper.ddl, parameter = "Phi", type = "age",

bins = c(0, 1, 3, 6), right = FALSE, replace = TRUE, name = "newages")

dipper.ddl$Phi[(1:21), -c(2, 11)]

## par.index group cohort age time occ.cohort Cohort Age Time Flood

## 1 1 Female 1980 0 1980 1 0 0 0 0

## 2 2 Female 1980 1 1981 1 0 1 1 1

## 3 3 Female 1980 2 1982 1 0 2 2 1

## 4 4 Female 1980 3 1983 1 0 3 3 0

## 5 5 Female 1980 4 1984 1 0 4 4 0

## 6 6 Female 1980 5 1985 1 0 5 5 0

## 7 7 Female 1981 0 1981 2 1 0 1 1

## 8 8 Female 1981 1 1982 2 1 1 2 1

## 9 9 Female 1981 2 1983 2 1 2 3 0

## 10 10 Female 1981 3 1984 2 1 3 4 0

## 11 11 Female 1981 4 1985 2 1 4 5 0

## 12 12 Female 1982 0 1982 3 2 0 2 1

## 13 13 Female 1982 1 1983 3 2 1 3 0

## 14 14 Female 1982 2 1984 3 2 2 4 0

## 15 15 Female 1982 3 1985 3 2 3 5 0

## 16 16 Female 1983 0 1983 4 3 0 3 0

## 17 17 Female 1983 1 1984 4 3 1 4 0

## 18 18 Female 1983 2 1985 4 3 2 5 0

## 19 19 Female 1984 0 1984 5 4 0 4 0

## 20 20 Female 1984 1 1985 5 4 1 5 0

## 21 21 Female 1985 0 1985 6 5 0 5 0

## newages

## 1 [0,1)

## 2 [1,3)

## 3 [1,3)

## 4 [3,6]

## 5 [3,6]

## 6 [3,6]

## 7 [0,1)

## 8 [1,3)

## 9 [1,3)

## 10 [3,6]

## 11 [3,6]

## 12 [0,1)

## 13 [1,3)

## 14 [1,3)

## 15 [3,6]

## 16 [0,1)

## 17 [1,3)

## 18 [1,3)

## 19 [0,1)

## 20 [1,3)

## 21 [0,1)

PIMS(mark(dipper.proc, dipper.ddl, model.parameters = list(p = list(formula = ~time),

78


Phi = list(formula = ~newages)), invisible = FALSE, run = FALSE), "Phi",

simplified = TRUE)


## 1980 1981 1982 1983 1984 1985

## 1980 1 2 2 3 3 3

## 1981 1 2 2 3 3

## 1982 1 2 2 3

## 1983 1 2 2

## 1984 1 2

## 1985 1

## group = sexMale

## 1980 1981 1982 1983 1984 1985

## 1980 1 2 2 3 3 3

## 1981 1 2 2 3 3

## 1982 1 2 2 3

## 1983 1 2 2

## 1984 1 2

## 1985 1

As an alternative option, we can use merge_design.covariates with the .ddl and auser-de�ned dataframe to add data to the .ddl for temporal covariates or fro a group-speci�cvariable (you can see ?merge_design.covariates) for additional details.

df = data.frame(time = c(1980:1986), covar = c(4, 5, 6, 7, 8, 9, 10))

dipper.ddl$Phi = merge_design.covariates(dipper.ddl$Phi, df, bytime = TRUE)

dipper.ddl$Phi[(1:21), -c(1:2, 11)]

## model.index group cohort age occ.cohort Cohort Age Time Flood newages

## 1 1 Female 1980 0 1 0 0 0 0 [0,1)

## 2 2 Female 1980 1 1 0 1 1 1 [1,3)

## 3 3 Female 1980 2 1 0 2 2 1 [1,3)

## 4 4 Female 1980 3 1 0 3 3 0 [3,6]

## 5 5 Female 1980 4 1 0 4 4 0 [3,6]

## 6 6 Female 1980 5 1 0 5 5 0 [3,6]

## 7 7 Female 1981 0 2 1 0 1 1 [0,1)

## 8 8 Female 1981 1 2 1 1 2 1 [1,3)

## 9 9 Female 1981 2 2 1 2 3 0 [1,3)

## 10 10 Female 1981 3 2 1 3 4 0 [3,6]

## 11 11 Female 1981 4 2 1 4 5 0 [3,6]

## 12 12 Female 1982 0 3 2 0 2 1 [0,1)

## 13 13 Female 1982 1 3 2 1 3 0 [1,3)

## 14 14 Female 1982 2 3 2 2 4 0 [1,3)

## 15 15 Female 1982 3 3 2 3 5 0 [3,6]

## 16 16 Female 1983 0 4 3 0 3 0 [0,1)

## 17 17 Female 1983 1 4 3 1 4 0 [1,3)

## 18 18 Female 1983 2 4 3 2 5 0 [1,3)

## 19 19 Female 1984 0 5 4 0 4 0 [0,1)

## 20 20 Female 1984 1 5 4 1 5 0 [1,3)

## 21 21 Female 1985 0 6 5 0 5 0 [0,1)

## covar

## 1 4

## 2 5

79


## 3 6

## 4 7

## 5 8

## 6 9

## 7 5

## 8 6

## 9 7

## 10 8

## 11 9

## 12 6

## 13 7

## 14 8

## 15 9

## 16 7

## 17 8

## 18 9

## 19 8

## 20 9

## 21 9

Model Formula and Design Matrix

RMark relies on a formula based- approach to development of the design matrix structurethat is passed to MARK. For those of you that use R or most any other command line interfacefor inference this will make sense, but for the strict MARK user this may be a new experience.Depending on your point of view, the formula-driven interface for creating model structureand data in RMark is either the best thing since sliced bread, or a bastardisation of all thingsMARK (you can see regular snarky comments to the second point by some faceless personwho's user name rhymes with 'hooch' that likes to troll around on http://www.phidot.org/forum/index.php).

For what its worth, the approach of manual model creation used by MARK is a fantasticlearning tool, but can be frustrating and extremely time-consuming as model complexityincreases, or as additional data are gathered. Both Je� and I are obvious proponents ofstarting with the basics when learning about how model structures are designed, but Je�developed RMark such that 1) he could have a formula based interface to model creation, 2)to simplify his own work as many of his projects were monitoring focused, thus new data wasadded regularly, requiring him to re-develop the MARK structure each year, and 3) to supportmaking his work more reliable/reproducible. Obviously RMark does not negate errors, I havesurely made them when using it, but it does simplify some aspects of model development.

Most of what we have seen so far has been showing the relationship between the designdata list and the PIM structure. However, the underlying approach that RMark uses is tocreate a design matrix that is then passed to MARK for evaluation. The crux of this is the Rfunction model.matrix. As an example, lets consider our dipper data and assume we wanta model that has time-varying survival, if we had read the MARKBOOK (Chapter 6 to beexact) we would know that you can specify an e�ect of time using diagonal values of 1 in thedesign matrix. Not surprisingly, we can do this in RMark as well (�rst 21 rows shown herefor the �rst time, otherwise these might get a bit tedious page wise):

80


dm = model.matrix(~time, dipper.ddl$Phi)

head(dm, 21)

## (Intercept) time1981 time1982 time1983 time1984 time1985

## 1 1 0 0 0 0 0

## 2 1 1 0 0 0 0

## 3 1 0 1 0 0 0

## 4 1 0 0 1 0 0

## 5 1 0 0 0 1 0

## 6 1 0 0 0 0 1

## 7 1 1 0 0 0 0

## 8 1 0 1 0 0 0

## 9 1 0 0 1 0 0

## 10 1 0 0 0 1 0

## 11 1 0 0 0 0 1

## 12 1 0 1 0 0 0

## 13 1 0 0 1 0 0

## 14 1 0 0 0 1 0

## 15 1 0 0 0 0 1

## 16 1 0 0 1 0 0

## 17 1 0 0 0 1 0

## 18 1 0 0 0 0 1

## 19 1 0 0 0 1 0

## 20 1 0 0 0 0 1

## 21 1 0 0 0 0 1

We can use any formula to create the design matrix from the design data �elds, assumingthey are correct. For instance, we can create and show the design matrix for a model havinge�ects of sex and Flood,

dm = model.matrix(~sex + Flood, dipper.ddl$Phi)

head(dm, 6)

## (Intercept) sexMale Flood

## 1 1 0 0

## 2 1 0 1

## 3 1 0 1

## 4 1 0 0

## 5 1 0 0

## 6 1 0 0

or perhaps models with a sex*Flood e�ect,

dm = model.matrix(~sex * Flood, dipper.ddl$Phi)

head(dm, 6)

## (Intercept) sexMale Flood sexMale:Flood

## 1 1 0 0 0

## 2 1 0 1 0

## 3 1 0 1 0

## 4 1 0 0 0

## 5 1 0 0 0

## 6 1 0 0 0

81


So, what is happening here? Well, RMark is taking the formulas ~time or ~sex + Floodand creates the design matrix for each parameter we de�ne a model structure for usingmodel.matrix and then pastes the resultant matrices together to create a single designmatrix that works for MARK. Which, for our simple example would look like this,

Phi design matrix 0

0 p design matrix

Formula in RMark are not limited to only those �elds in the design data, because if younoticed, individual covariate data is not included in the design data list (which you can seeusing str(dipper.ddl)). Rather, individual covariate data are housed in the processed datadataframe. Because MARK uses the names of the individual covariates in the design matrix,which model.matrix cannot handle, RMark inserts dummy data into the design data for anyindividual covariates used in the formula, then calls model.matrix and inserts the covariatenames as needed to construct the full design matrix for MARK.

Fitting Models

Now that we have a processed data frame and the appropriate design data list set up, wecan proceed with specifying the model we want to run by de�ning the parameter speci�cationswe want to use and then call MARK to �t the model. These next steps tend to have a work�owin concert with each other, so they will be detailed together here, however that does notmean tha they cannot be separated. The work�ow pattern is simple, create the parameterspeci�cation lists and insert those into the mark call with whatever optional arguments areneed at both steps (detailed in ?mark and hence not discussed here) and poof, MARK andRMark make some magic.

Parameter speci�cations are pretty straightforward for folks used to CLI approaches tomodeling. In a nutshell, the parameter speci�cation is where you de�ne what variable youwant to a�ect what model parameter. For instance (and perhaps easiest), we can create anobject for each parameter speci�cation we want to use, giving each object a speci�c (unique)name (although it is �ne to use the same speci�cation in multiple models), following:

Phi.1 = list(formula = ~time)

Phi.2 = list(formula = ~Time)

Phi.3 = list(formula = ~sex * Flood)

p.1 = list(formula = ~1)

p.2 = list(formula = ~time)

The values used in the parameter speci�cation are similar to those used in R in somecases, such as for a constant model (intercept only, so all Phi are equal) is ~1, while ~time isequivalent to the t model and ~Time is the linear trend over time in MARK. Then, once yourparameter speci�cations are set up, you can �t MARK models by including one of the objectnames in the model.parameters argument of mark. For instance, and there are several waysto do this, but to keep it simple �rst o� we wil run a model using Phi.1 and p.1 parameterspeci�cations de�ned above. So, what do we see? First, we can see that the output summary

82


tells us that we used a CJS model, which is what we speci�ed in process.data() so that'sgood. Also, we get the name of the model, the number of parameters (NPar), model �tcriterion, and the model's resultant beta and real estimates. Couple of things to notice:the beta parameters are labeled speci�cally to the time period of interest , and the realparameter estimates are inserted into a diagonal matrix, labeled by the grouping variable(sex), and exported as part of the RMark output. Now, if you wanted to just see the MARKoutput �le (the marknnn.out one would typically see by retrieving the model results in MARK,you can just type the model name (model.1) into R and it will open the .txt �le in whatevereditor you have selected.

model.1 = mark(dipper.proc, dipper.ddl, model.parameters = list(Phi = Phi.1,

p = p.1))

##

## Output summary for CJS model

## Name : Phi(~time)p(~1)

##

## Npar : 7

## -2lnL: 659.7

## AICc : 674

##

## Beta


## Phi:(Intercept) 0.514392 0.4768 -0.4201 1.4489

## Phi:time1981 -0.698142 0.5537 -1.7834 0.3872

## Phi:time1982 -0.600937 0.5301 -1.6399 0.4381

## Phi:time1983 -0.006107 0.5335 -1.0517 1.0395

## Phi:time1984 -0.075713 0.5277 -1.1099 0.9585

## Phi:time1985 -0.178064 0.5266 -1.2101 0.8540

## p:(Intercept) 2.220395 0.3289 1.5758 2.8650

##

##

## Real Parameter Phi

## Group:sexFemale

## 1980 1981 1982 1983 1984 1985

## 1980 0.6258 0.4542 0.4784 0.6244 0.6079 0.5833

## 1981 0.4542 0.4784 0.6244 0.6079 0.5833

## 1982 0.4784 0.6244 0.6079 0.5833

## 1983 0.6244 0.6079 0.5833

## 1984 0.6079 0.5833

## 1985 0.5833

##

## Group:sexMale

## 1980 1981 1982 1983 1984 1985

## 1980 0.6258 0.4542 0.4784 0.6244 0.6079 0.5833

## 1981 0.4542 0.4784 0.6244 0.6079 0.5833

## 1982 0.4784 0.6244 0.6079 0.5833

## 1983 0.6244 0.6079 0.5833

## 1984 0.6079 0.5833

## 1985 0.5833

##

##

## Real Parameter p

83


## Group:sexFemale

## 1981 1982 1983 1984 1985 1986

## 1980 0.9021 0.9021 0.9021 0.9021 0.9021 0.9021

## 1981 0.9021 0.9021 0.9021 0.9021 0.9021

## 1982 0.9021 0.9021 0.9021 0.9021

## 1983 0.9021 0.9021 0.9021

## 1984 0.9021 0.9021

## 1985 0.9021

##

## Group:sexMale

## 1981 1982 1983 1984 1985 1986

## 1980 0.9021 0.9021 0.9021 0.9021 0.9021 0.9021

## 1981 0.9021 0.9021 0.9021 0.9021 0.9021

## 1982 0.9021 0.9021 0.9021 0.9021

## 1983 0.9021 0.9021 0.9021

## 1984 0.9021 0.9021

## 1985 0.9021

One of the real bene�ts of RMark is the ability to script analyses, especially for those folkswho are involved in continual, mark-recapture based monitoring programs. Je� developedRMark speci�cally because it simpli�ed how he could approach re-analysis of monitoringdata without having to re-create the MARK analysis each year as one more column of datawas added. The ability to easily script analyses of monitoring data is one of the best reasonsto learn enough RMark. As an example, lets assume, using our dipper dataset that we havea set of models which we run after each years data collection.

dipper.monitoring = function() {

Phi.1 = list(formula = ~1)


cml = create.model.list("CJS")

mark.wrapper(cml, data = dipper.proc, ddl = dipper.ddl, output = FALSE)

}

dummy = capture.output(dipper.res <- dipper.monitoring())

dipper.res$model.table

## Phi p model npar AICc DeltaAICc weight Deviance

## 1 ~1 ~1 Phi(~1)p(~1) 2 670.9 0 1 84.36

O, so lets add a year's worth of information typical of a monitoring program (for simplicitywe will just assume none of the critters were captured during this new encounter occasion andthat we are only using a constant models for Phi and p) and then re-initiate the modelingprocess. First, lets load the original data and add an additional encounter history to thatdataset and have a look at both.

84


dipper.nextyr = convert.inp("C:/Program Files (x86)/MARK/Examples/dipper.inp",

group.df = data.frame(sex = c("Male", "Female")))

head(dipper.nextyr)

## ch freq sex

## 1:1 1111110 1 Male

## 1:3 1111000 1 Male

## 1:6 1100000 1 Male

## 1:7 1100000 1 Male

## 1:8 1100000 1 Male

## 1:9 1100000 1 Male

nextyr = rep(0, nrow(dipper.nextyr))

dipper.nextyr$ch = paste(dipper.nextyr$ch, nextyr, sep = "")

head(dipper.nextyr)

## ch freq sex

## 1:1 11111100 1 Male

## 1:3 11110000 1 Male

## 1:6 11000000 1 Male

## 1:7 11000000 1 Male

## 1:8 11000000 1 Male

## 1:9 11000000 1 Male

Next, lets process the new capture history data and re-run the analysis in full.

dipper.next = process.data(dipper.nextyr, model = "CJS", groups = "sex", begin.time = 1980)

dipper.ddlnext = make.design.data(dipper.next)

dipper.nextmonitoring = function() {

Phi.1 = list(formula = ~1)


cmlnext = create.model.list("CJS")

mark.wrapper(cmlnext, data = dipper.next, ddl = dipper.ddlnext, output = FALSE)

}

dummy = capture.output(dipper.resnext <- dipper.nextmonitoring())

dipper.resnext$model.table

## Phi p model npar AICc DeltaAICc weight Deviance

## 1 ~1 ~1 Phi(~1)p(~1) 2 789.7 0 1 203.2

And quickly look at the resultant real estimates for each model.

dipper.res[[1]]$results$real

## estimate se lcl ucl fixed note

## Phi gFemale c1980 c1 a0 t1980 0.5602 0.02513 0.5105 0.6088

## p gFemale c1980 c1 a1 t1981 0.9026 0.02859 0.8305 0.9460

85


dipper.resnext[[1]]$results$real

## estimate se lcl ucl fixed note

## Phi gFemale c1980 c1 a0 t1980 0.4759 0.02448 0.4283 0.524

## p gFemale c1980 c1 a1 t1981 0.8723 0.03607 0.7836 0.928

As you can see, this is one of the primary strengths of RMark for MARK style analyses, thesimplicity of using a script to re-evaluate updated datasets. Additionally, this is the �rstplace we have actually extracted the information that folks typically want from an analysis,the real estimates. So, to see where stu� is in a MARK object, look at the structure of thebelow list for model.extract below. What you can see is that the MARK model object is alist with 16 objects, which include the mode �t criterion (lnl=log likelihood, npar=numberof parameters, etc.), as well as the beta and real estimates and their associated measure ofprecision (SE, CL). In addition, there are a few values (e.g., derived, covariate.values) thatare NULL in this example, primarily because they are not used with model=�CJS�. Lookingat the structure of a mark() object is one of the quickest ways to identify the parametersfrom the model.

model.extract = mark(dipper.proc, dipper.ddl, model.parameters = list(p = list(formula = ~time),

Phi = list(formula = ~time)), invisible = FALSE, output = FALSE)

##

## Note: only 11 parameters counted of 12 specified parameters

## AICc and parameter count have been adjusted upward

str(model.extract$results)

## List of 16

## $ lnl : num 657

## $ deviance : num 74.5

## $ deviance.df : num 30

## $ npar : int 12

## $ npar.unadjusted : num 11

## $ n : int 426

## $ AICc : num 682

## $ AICc.unadjusted : num 680

## $ beta :'data.frame': 12 obs. of 4 variables:

## ..$ estimate: num [1:12] 0.935 -1.198 -1.023 -0.42 -0.536 ...

## ..$ se : num [1:12] 0.769 0.871 0.805 0.809 0.803 ...

## ..$ lcl : num [1:12] -0.571 -2.905 -2.6 -2.006 -2.11 ...

## ..$ ucl : num [1:12] 2.442 0.508 0.555 1.166 1.038 ...

## $ real :'data.frame': 12 obs. of 6 variables:

## ..$ estimate: num [1:12] 0.718 0.435 0.478 0.626 0.599 ...

## ..$ se : num [1:12] 0.1555 0.0688 0.0597 0.0593 0.0561 ...

## ..$ lcl : num [1:12] 0.361 0.308 0.364 0.505 0.486 ...

## ..$ ucl : num [1:12] 0.92 0.571 0.594 0.733 0.702 ...

## ..$ fixed : Factor w/ 1 level " ": 1 1 1 1 1 1 1 1 1 1 ...

## ..$ note : Factor w/ 1 level " ": 1 1 1 1 1 1 1 1 1 1 ...

## $ beta.vcv : num [1:12, 1:12] 0.591 -0.635 -0.591 -0.591 -0.591 ...

## $ derived :'data.frame': 0 obs. of 0 variables

## $ derived.vcv : NULL

86


## $ covariate.values: NULL

## $ singular : int 6

## $ real.vcv : NULL

So, if we want to see the real estimates for this model, simple enough to do:

model.extract$results$real[-c(5:6)]


## Phi gFemale c1980 c1 a0 t1980 0.7182 0.15555 3.610e-01 0.9200






## p gFemale c1980 c1 a1 t1981 0.6962 0.16576 3.303e-01 0.9142

## p gFemale c1980 c1 a2 t1982 0.9231 0.07288 6.161e-01 0.9890

## p gFemale c1980 c1 a3 t1983 0.9130 0.05818 7.141e-01 0.9779

## p gFemale c1980 c1 a4 t1984 0.9008 0.05383 7.360e-01 0.9673

## p gFemale c1980 c1 a5 t1985 0.9324 0.04580 7.685e-01 0.9829

## p gFemale c1980 c1 a6 t1986 0.6931 49.14193 5.138e-197 1.0000

model.extract$results$beta[-c(5:6)]


## Phi:(Intercept) 0.93546 0.7685 -0.5709 2.4418

## Phi:time1981 -1.19828 0.8707 -2.9048 0.5082

## Phi:time1982 -1.02283 0.8049 -2.6005 0.5548

## Phi:time1983 -0.41986 0.8092 -2.0058 1.1661

## Phi:time1984 -0.53610 0.8031 -2.1103 1.0381

## Phi:time1985 0.24814 302.4842 -592.6208 593.1171

## p:(Intercept) 0.82928 0.7837 -0.7068 2.3654

## p:time1982 1.65563 1.2914 -0.8755 4.1867

## p:time1983 1.52210 1.0729 -0.5808 3.6250

## p:time1984 1.37675 0.9885 -0.5607 3.3142

## p:time1985 1.79509 1.0689 -0.2999 3.8901

## p:time1986 -0.01475 231.0123 -452.7988 452.7693

model.extract$results$beta$estimate[1]

## [1] 0.9355

Just so it makes sense on how the beta and real estimates are linked, we can estimate theapparent survival probability for the 1981 period of 0.4347 by extracting out the appropriatebeta estimates and using the logistic distribution function plogis() to transform the beta'sback to the real scale.

round(plogis(model.extract$results$beta$estimate[1] + model.extract$results$beta$estimate[2] *

1), 7)

## [1] 0.4347

87


Plotting Results

One of the bene�ts of RMark, probably nearly as useful as model scripting, is that oncewe have run a set of models, we can use all the tools available in R to further manipulate theresults from the �tted model(s). There are quite a few ways you can manipulate the RMarkoutput in R, but the one of primary interest will likely be how to build plots from your data.Sticking with the dipper example, I am going to recreate from scratch using the dipper datahosed in RMark a sequence of models and use covariate.predictions() to build what hasbecome the standard example plot for showing how RMark will work. So, using the standardexample:

data(dipper)

dipper$weight = rnorm(nrow(dipper), mean = 10, sd = 2)

dipper.proc = process.data(dipper, model = "CJS", groups = "sex", begin.time = 1980)

dipper.ddl = make.design.data(dipper.proc)

dipper.ddl$Phi$flood = ifelse(dipper.ddl$Phi$time %in% 1982:1983, 1, 0)

dipper.analysis = function() {

Phi.1 = list(formula = ~time)

Phi.2 = list(formula = ~-1 + time, link = "sin")

Phi.3 = list(formula = ~sex + weight)

Phi.4 = list(formula = ~flood)


p.2 = list(formula = ~time)

p.3 = list(formula = ~Time)

p.4 = list(formula = ~sex)

cml = create.model.list("CJS")

mark.wrapper(cml, data = dipper.proc, ddl = dipper.ddl, output = FALSE)

}

dummy = capture.output(dipper.results <- dipper.analysis())

Here, we use the model for sex and weight detailed above as an example of the how to getpredictions of survival for a range of weight value for sex (image found at end of document).

minmass = min(dipper$weight)

maxmass = max(dipper$weight)

mass.values = minmass + (0:30) * (maxmass - minmass)/30

PIMS(dipper.results[[11]], "Phi", simplified = FALSE)


## 1980 1981 1982 1983 1984 1985

## 1980 1 2 3 4 5 6

## 1981 7 8 9 10 11

## 1982 12 13 14 15

## 1983 16 17 18

## 1984 19 20

## 1985 21

## group = sexMale

## 1980 1981 1982 1983 1984 1985

## 1980 22 23 24 25 26 27

## 1981 28 29 30 31 32

88


## 1982 33 34 35 36

## 1983 37 38 39

## 1984 40 41

## 1985 42


Phibymass = covariate.predictions(dipper.results[[11]], data = data.frame(weight = mass.values),

indices = c(1))

plot(Phibymass$estimates$covdata, Phibymass$estimates$estimate, type = "l",

lwd = 2, xlab = "Mass(g)", ylab = "Female Survival", ylim = c(0, 1), las = 1)

lines(Phibymass$estimates$covdata, Phibymass$estimates$lcl, lty = 2)

lines(Phibymass$estimates$covdata, Phibymass$estimates$ucl, lty = 2)

# Compute and plot survival values for males

Phibymass = covariate.predictions(dipper.results[[11]], data = data.frame(weight = mass.values),

indices = c(22))

plot(Phibymass$estimates$covdata, Phibymass$estimates$estimate, type = "l",

lwd = 2, xlab = "Mass(g)", ylab = "Male Survival", ylim = c(0, 1), las = 1)

lines(Phibymass$estimates$covdata, Phibymass$estimates$lcl, lty = 2)

lines(Phibymass$estimates$covdata, Phibymass$estimates$ucl, lty = 2)

89


6 8 10 12 14 16

0.0

0.2

0.4

0.6

0.8

1.0

Mass(g)

Fem

ale

Sur

viva

l

6 8 10 12 14 16

0.0

0.2

0.4

0.6

0.8

1.0

Mass(g)

Mal

e S

urvi

val

7.2 Distance Sampling

The primary analysis engine for estimation of density data based on some form of point orline transect is the program Distance (Buckland et al. 2001). Similar to MARK, Distancehas a very active user base, detailed documentation, quality of the user documentation(http://www.ruwpa.st-and.ac.uk/distance.book/), and active base of user-driven sup-port (http://www.ruwpa.st-and.ac.uk/distance/distancelist.html). While there issome capability for automatic model creation in Distance, most models are built manu-ally with a graphical user interface to specify the parameter structures and design matrices.Thus, manual model creation can be useful during the learning process but eventually itbecomes a time-consuming and sometimes frustrating exercise that may add an unnecessarysource of error in the analysis.

90


mrds is a R package that provides a formula based interface for conducting Distance sam-pling. mrds is on the Contributed R Archive Network (CRAN) (http://cran.r-project.org). mrds contains functionality to build Distance sampling models from formulas, run themodel without interacting with the program Distance, extract the output, and summarizeand display the results with automatic labeling. mrds also has functions for model averag-ing, prediction, variance components, and supports the current Distance sampling GUI. Inaddition, all of the tools in R are available which enable a completely scripted analysis fromdata to results and inclusion into a document with Sweave (Leisch 2002) and LATEX to createa reproducible manuscript such as this one.

Here we are going to provide an overview of the mrds package and how it can be used tobene�t Distance users. For more detailed documentation, refer to the online documenta-tion at http://www.ruwpa.st-and.ac.uk/distance/and the help within the mrds package.And, just to be fair, this is a piece of notes that Je� Laake worked up for teaching purposesreal quick for me as I ran out of time this week and he saved my rear so I did not have towork it up from scratch.

library(mrds)

## This is mrds 2.1.4

## Built: R 3.0.2; ; 2013-12-03 23:07:22 UTC; windows

# get data and put ds_data into format for ddf

data(stake77)

ds_data = stake77[stake77$Obs1 == 1, ]

ds_data$distance = ds_data$PD

ds_data$observer = 1

ds_data$object = 1:nrow(ds_data)

ds_data$Sample.Label = 1

ds_data$Region.Label = 1

First, we can �t a standard half normal detection function to the stake data, look at theoutput, and evaluate the goodness-of-�t test statistic.

# fit half-normal detection function to stakes observered along 1km line 20

# meters to each side

detection_model = ddf(dsmodel = ~cds(key = "hn"), data = ds_data, meta.data = list(width = 20))

detection_model

##

## Distance sampling analysis object

##

## Summary for ds object

## Number of observations : 81

## Distance range : 0 - 20

## AIC : 460

##

## Detection function:

## Half-normal key function

##

## Detection function parameters

## Scale Coefficients:

## estimate se

91


## (Intercept) 2.212 0.1077

##

## Estimate SE CV

## Average p 0.5562 0.05009 0.09006

## N in covered region 145.6316 16.97653 0.11657

ddf.gof(detection_model)

●●●●●●●

●●●●●●

●

●●●●●●

●●●●●●●

●●●●●●

●●●●

●●●●

●●●●●

●●●●

●●●●●

●●●●●●

●●●●

●●●●●

●●●●

●●●●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Empirical cdf

Fitt

ed c

df

##

## Goodness of fit results for ddf object

##

## Chi-square tests

## [0,2.22] (2.22,4.44] (4.44,6.67] (6.67,8.89] (8.89,11.1]

## Observed 13.0000 17.0000 14.00000 12.00000 11.0000

## Expected 16.0230 15.1073 13.42952 11.25562 8.8942

## Chisquare 0.5703 0.2371 0.02423 0.04923 0.4985

## (11.1,13.3] (13.3,15.6] (15.6,17.8] (17.8,20] Total

## Observed 5.0000 3.0000 3.000000 3.0000 81.000

## Expected 6.6263 4.6555 3.082338 1.9262 81.000

## Chisquare 0.3992 0.5887 0.002199 0.5985 2.968

##

## P =0.88794 with 7 degrees of freedom

##

## Distance sampling Kolmogorov-Smirnov test

## Test statistic = 0.082692 P = 0.63688

##

92


## Distance sampling Cramer-von Mises test(unweighted)

## Test statistic = 0.051489 P = 0.86719

And of course we can plot the detection function.

plot(detection_model)

0 5 10 15 20

0.0

0.2

0.4

0.6

0.8

1.0

Distance

Det

ectio

n pr

obab

ility

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●●●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●●

Detection function plot

We can also easily tell R to estimate density and abundance using the handy functiondht() that Je� wrote.

# compute density and abundance - here units are sq meters

sample_table = data.frame(Sample.Label = 1, Region.Label = 1, Effort = 1000)

region_table = data.frame(Region.Label = 1, Area = 40000)

dht(detection_model, sample.table = sample_table, region.table = region_table)

##

## Summary statistics:

## Region Area CoveredArea Effort n k ER se.ER cv.ER

## 1 1 40000 40000 1000 81 1 0.081 0 0

##

## Abundance:

## Label Estimate se cv lcl ucl df

## 1 Total 145.6 17.82 0.1224 105.7 200.6 4.676

##

## Density:


## 1 Total 0.003641 0.0004456 0.1224 0.002644 0.005014 4.676

93


And we can change the units of measurement to �t out estimation needs.

# compute density and abundance - here units are sq kilometers for area, km

# for length and m for distance

sample_table = data.frame(Sample.Label = 1, Region.Label = 1, Effort = 1)

region_table = data.frame(Region.Label = 1, Area = 0.04)

# convert distance measurements in meters to km

dht(detection_model, sample.table = sample_table, region.table = region_table,

options = list(convert.units = 0.001))

##

## Summary statistics:

## Region Area CoveredArea Effort n k ER se.ER cv.ER

## 1 1 0.04 0.04 1 81 1 81 0 0

##

## Abundance:


## 1 Total 145.6 17.82 0.1224 105.7 200.6 4.676

##

## Density:


## 1 Total 3641 445.6 0.1224 2644 5014 4.676

Obviously, there is signi�cantly more complex analyses that can be conducted using dis-tance based methods, but to be honest you have to study it pretty intensely as it is notintuitive, primarily because you have to match your R dataframe to what the Distance pro-gram data structure looks like, so there are multiple tables that need created. But, if youget through that, then mrds can be very handy, you can do analysis such as conventionaldistance sampling, multiple-covariate distance sampling, mark-recapture distance sampling,and all the variants in between. I have a good example that needs to be included in mrdson mark-recapture distance sampling for avian point count samples, holler if you want it.

7.3 Spatial Models

Not surprisingly, collection and analysis of GPS data focused on animal movements, habitatselection or avoidance, and for evaluating the impacts of management has become one ofthe foremost areas of scienti�c interest for wildlife and �sheries ecologists. Historically, mostresearch using GPS technology has focused on large mammals, both terrestrial and aquatic,but recent technological advances that created lightweight GPS units for avifauna has openedsome interesting opportunities for evaluating how reproductive phenology impacts animalmovements, or how habitats are selected across a reproductive cycle.

Because I like turkeys (and I already have vignette I wrote done using knitr and LYX)we will start with a simple example using data from an adult male Rio Grande wild turkeycaptured and tagged in south Texas (Guthrie et al. 2011, Byrne et al. 2014). So, �rst weload the package I created to do the analysis called moveud and take a quick look at theraw data�le (aptly named) rawturkey. Now, this �le is pretty much entirely unedited, itincludes all the missed �xes (so there are missing values or NAs in the dataframe), its inlat/long, and date and time are not in a standard format (which will need changed later).But, rather than import a clean �le, we assume that most GPS datasets will be nothing butclean, so we have decided to embed the data management actions into this document so thathopefully they help other folks out with similar issues.

94


library(moveud)

data(rawturkey)

head(rawturkey)

## Try FixDate FixTime Lat Lon Sats HDOP

## 1 1 3/10/2009 17:26:33 27.83 -98.02 6 1.2

## 2 2 3/10/2009 17:27:09 27.83 -98.02 4 3.4

## 3 3 3/11/2009 6:01:03 27.91 -98.38 8 1.1

## 4 4 3/11/2009 6:02:09 27.91 -98.38 6 1.3

## 5 5 3/11/2009 6:33:09 27.91 -98.38 5 1.7

## 6 6 3/11/2009 7:02:57 27.91 -98.38 7 1.4

Then, because we are smart, we decide to plot the data and see whether or not it actuallymakes sense. Well, it doesn't, have a look at the �rst �gure below. Wonder what that onepoint is doing way down south there in the unedited locations plot? Well, that is where mygraduate student activated the GPS unit in the morning before he put it out. So, obviouslythat is a junk location and we need to remove it from our data �le before we proceed. So,using a little code, we created a new dataset called newrawturkey that e�ectively removesthat weird point, and replotted the GPS locations for this individual in the edited points(Figure 1).

suppressPackageStartupMessages(library(moveud))

data(rawturkey)


plot(rawturkey$Lon, rawturkey$Lat, main = "Unedited Points", pch = 20, col = "red",


newrawturkey = rawturkey[rawturkey$Lon < -98.1, ]

plot(newrawturkey$Lon, newrawturkey$Lat, main = "Edited Points", pch = 20, col = "red",


95


●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●●●●●● ●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●

●●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

● ●●●●●●●●●●●●●●●●●●●●●●●●●●

−98.4 −98.3 −98.2 −98.1

27.84

27.86

27.88

27.90

27.92

Unedited Points

Longitude

Latit

ude

●●●●●●●

●

●

●●

●

●●●●●●●

●●●●●●●●●●●●●●

●

●●●●●●●●●●

●●●

●●●

●

●●●●●●●●●● ●

●

●●●● ● ●

●

●

●●●●●●

●●●●●●

●●●●●●●●

●●

●●●

● ●

● ●● ●●●●●

●● ● ●●●●● ●●●●●●●●

●●●●●●

● ●●

●●

●●

●●●●●

●●

●

● ●●●●●●●●●●●

●●● ●●●

●●●●●●●●●●●●●●●●●●●● ● ● ●

●●●●●●●●●●●●

●●●●●●●●●●

●

●●●● ● ●●●●●●

●

● ●●●●

●●●●●●●●●●●● ●●●●●●●●●●●●●● ● ●●●●●●●●●●●●● ●

●●

●●

●●●

●

●●●

●●●●

●● ●●

●●●●●●●●●●

●●●●●●

●● ●●●●●● ● ●●●●●●●●●●●●●

●●●●

●●●

●●●● ● ●●●●●●●●●●●●●●●●●●●●●

●●●●

● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●

●●● ●●● ●●

● ● ● ●●●●●●

●

●●●●●●●●●●●

●●●●

●●●

●● ● ● ●

●●●●●●●●●●●

●●

●●●●●● ●

●● ● ●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●

●●●●● ●●●●●

●●●●●●

●●●●●●●●●●●●●●● ●●●●●●

●● ●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●● ● ●●

●●●●●●●● ●

●●●

●●●

●●●●●●

●●● ● ●●●●●

●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●

●● ●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●● ● ●●●

●●●●●●●●●●●●●●● ● ●●●●●

●●●●●●●●●●●●●●

●●●

●●

●●●●●●●●●●●●●●●●●●●●●●● ●● ●●● ●

● ●●●●●●●●

●

●●

●●●

●●●●●●● ● ●●●

●●●●●

●●

●● ● ● ●●

●●●

●●●●●●●●●●●

●●

●

●●●●●●●●●

●●●●●

● ●●●●

●

●● ● ●● ● ●●●●●●●●●●●●● ●●●

●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●● ●

●●● ●●●●●●●●●●●

●●

●●●●

●●●●●●●●●●●●●●●●●● ●●

● ● ● ●●●●●●●●●●●●●

●●●●●●●●●●●

● ●●●●●●●●●●●●●

●●● ● ●●● ●●●●●●●●●●●●

●●●●●●●●●●●●●●●● ●●●●●●●

●● ●

●● ● ●●●●●●●●●●●●●

● ●●●●●●●●●●●●●

●●●●●●●●●●●●

●

●●●● ●● ●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●

●●●●● ● ● ●●●●●●●●●●●●●●

●●●●●●● ●●●●●●●●●●

●●●●● ●●●●●●

●●●●

●●●●●●●●●●

●

●●●●●●●●● ●●●●●●●●●

●●

●●●●● ● ●●●●●●●●●●●

● ●●●

●●●●

●

●

●

●●●●● ●●● ●●●●

●●●●

●●●● ●●●●●●●●●●●

●●●●●●●●●

●●●●●●●● ●●●● ●●●

●● ● ●●● ●● ●●●●●●●●●●

●●●●●●●●● ●●●●●●●●●●●

●

●●●●●●● ● ●●●●●●●●●●●

●●●●●

●●●●●●●●●●●●●●●● ●●●

●●●●● ●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●● ● ●

● ●●●

●●●●●●●●●●●●●●●

●●●●●

●●

●●●●●●

●●●●●

●●●●●

●●●●

●● ●●●●●●●●●●●●●●

●●●●●

●●●●●●●●●●●●

●●●●●●●● ●

●●

● ●● ●●●●●●●●●●●●

●

●●● ●●●●●●

●●●●

●

● ●● ●●●●●●● ● ●●●●●●●●●●●●●●●●

●

●● ●● ● ●●●●●●

●●●●●●●●●

●

●●●●● ● ●

●●●●●●●●●●●●●

●

●

●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●

●

●●●●●●● ● ●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●

●● ● ●●●●●●●●●●●●●●●● ●

●●●●●●●●●●● ●●

●

●●●●●●●●●●●● ●●●●●●●●●●●●●●●●

●●●

● ●●●●●●●● ●●●●●●●●●

● ●●

●

● ●●●

●●●●●●●●●●●●●● ● ●●●●● ●●●●●●●●●●●●●

●●●

●

●●●●●● ●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●

●●●●●●●● ●

●●●● ●●●●●●●●●●●●●●

●

●●●●●●

● ●● ● ●

● ●●●●● ●●●●● ●●●●●●●●●●●●●●●●●●●●

●

●●●●●●● ●●

●●● ● ●

●

●● ● ●●●●●

●

●

● ●●●●●●●●●●●●●●●●

●

●

●●●●●●●●●●●●●

●●

●●●●●●●●●

● ●●●●●●●●●●●●●●●●

●●●●●●

●●●● ●

●●●●●●●●●● ● ●

● ●● ●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●●● ●

●●●●●

● ●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●

●●●

●● ●

●

●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●● ●●●●

●●● ●●

●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●

●● ● ●

●●●●●●●●●●●●●●●●●●●

●●● ●●● ●●●●●●●●●

●●●●●●●●●●●● ●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●● ● ●

●●●● ●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●● ●●● ●●●

●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●● ●●●● ● ● ●

● ●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●

● ●●● ● ●● ●●

●● ●●●●●●●●●●●●●●●●●●

●●●●●●●●●●● ● ●● ●●● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●● ●●●●●●● ●●●●●● ●● ●● ●● ●●●●●●●●●●●●●●●●

●● ●● ●●●●●●●●●

●●● ●● ● ●● ●

● ●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●● ●●●●●●●●

●● ● ●●●●●●●●●●●●●●●●●●●

●

●●●●●●● ●●

●●●●●●●●●●●●

●

●

●●●●●●●●●●●●●●●●●●

●

●

●● ●●●● ●● ●●

●●●●

●●●●● ●

●●● ● ●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●

● ●●●●● ●

●

●●

● ● ● ●●●●●●●●●●●●●●●●●●

●

●●● ●●●●●●●●●●●●●● ●●

●●●● ●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●

● ●●●●●

●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●● ●●●●●●●●● ●

●

●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●● ●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

● ●●●●●●●●●●

●● ● ●●●●●

● ●●●●●●●●●●●●●●●●●●●●● ●

●●●●●●

●●●●●●●●●●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●

●

●

●●●

●

● ●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●

●●● ●●●●●●●●●●●●●●●●●●●●

●●●●●

●●

●

●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●

● ●●●●●●●●●●●

●

●●●●●●● ●●●●●●●●●●●●●●●●●●●● ● ●

●●●●●●●●●● ●●●●●●

●

●●●●● ●●●●●●●●●●●●●●●●●●●●

●

●●● ●●●●●●●●●●●●● ●●

●

●●●● ●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●

●●●●●●●●●● ●●●●●●

−98.39 −98.38 −98.37 −98.36 −98.35

27.89

27.90

27.91

27.92

27.93

Edited Points

Longitude

Latit

ude

Ok, so now we know that we have a set of locations that actually represents where theturkey was at when it was carrying the GPS unit. Lets see what we can do from here! Now,here is where R is more awesome that ArcMap. First, as we know our data set is just asimple �at �le, so we can operate on it just like we have done before, for instance, maybe wewant to extract and plot the points for a single day, March 12th for this example.

March12 = subset(newrawturkey, newrawturkey$FixDate == "3/12/2009")

plot(March12$Lon, March12$Lat, xlab = "Longitude", ylab = "Latitude", las = 1,

cex.axis = 0.7, main = "Turkey on 12 March 2009")

96


●●●●●

●

●●●

●●●●●●●

●●

●

●●

●

●

●●●●●●

−98.385 −98.380 −98.375 −98.370 −98.365

27.910

27.915

27.920

27.925

Turkey on 12 March 2009

Longitude

Latit

ude

Now, someone is saying, but wait, we are just operating on the raw dataset, what about'spatial' data as all we seem to be doing is plotting a dataframe? Well, you can convert (ina wide variety of ways) regular data to spatial data. For instance, I will use the R packagesp to convert the raw dataframe to class = SpatialPoints.

newrawturkey = na.omit(newrawturkey)

coords = cbind(Lon = newrawturkey$Lon, Lat = newrawturkey$Lat)

sp = SpatialPoints(coords)

str(sp)

## Formal class 'SpatialPoints' [package "sp"] with 3 slots

## ..@ coords : num [1:3361, 1:2] -98.4 -98.4 -98.4 -98.4 -98.4 ...

## .. ..- attr(*, "dimnames")=List of 2

## .. .. ..$ : NULL

## .. .. ..$ : chr [1:2] "Lon" "Lat"

## ..@ bbox : num [1:2, 1:2] -98.4 27.9 -98.3 27.9


## .. .. ..$ : chr [1:2] "Lon" "Lat"

## .. .. ..$ : chr [1:2] "min" "max"

## ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slots

## .. .. ..@ projargs: chr NA

Or, as an alternative, because we all know how awesome it is to work with dataframesand have all those data frame operations that we used to manipulate data work on spatialdata frames we can do (one way) this:

97


coordturkey = newrawturkey

coordinates(coordturkey) = cbind(Lon = coordturkey$Lon, Lat = coordturkey$Lat)

str(coordturkey)

## Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots

## ..@ data :'data.frame': 3361 obs. of 7 variables:

## .. ..$ Try : int [1:3361] 3 4 5 6 7 8 9 10 11 12 ...

## .. ..$ FixDate: Factor w/ 82 levels "3/10/2009","3/11/2009",..: 2 2 2 2 2 2 2 2 2 2 ...

## .. ..$ FixTime: Factor w/ 3025 levels "0:00:15","0:00:33",..: 2655 2658 2673 2696 2730 2776 2830 2889 2957 89 ...

## .. ..$ Lat : num [1:3361] 27.9 27.9 27.9 27.9 27.9 ...

## .. ..$ Lon : num [1:3361] -98.4 -98.4 -98.4 -98.4 -98.4 ...

## .. ..$ Sats : int [1:3361] 8 6 5 7 6 6 5 6 5 8 ...

## .. ..$ HDOP : num [1:3361] 1.1 1.3 1.7 1.4 1.6 2.7 2.5 2.1 2.4 1.4 ...

## .. ..- attr(*, "na.action")=Class 'omit' Named int [1:2] 2444 2928

## .. .. .. ..- attr(*, "names")= chr [1:2] "NA" "NA.1"

## ..@ coords.nrs : num(0)

## ..@ coords : num [1:3361, 1:2] -98.4 -98.4 -98.4 -98.4 -98.4 ...


## .. .. ..$ : chr [1:3361] "3" "4" "5" "6" ...

## .. .. ..$ : chr [1:2] "Lon" "Lat"

## ..@ bbox : num [1:2, 1:2] -98.4 27.9 -98.3 27.9


## .. .. ..$ : chr [1:2] "Lon" "Lat"

## .. .. ..$ : chr [1:2] "min" "max"



Cool, its a spatial data frame, which means we can operate on it like it was a data frame,and then plot the results (which should look pretty familiar to you)

turkey.subset = coordturkey[coordturkey$FixDate == "3/12/2009", ]

# str(turkey.subset)


plot(turkey.subset, pch = 1)

box()

plot(March12$Lon, March12$Lat, xlab = "Longitude", ylab = "Latitude", las = 1,

cex.axis = 0.7, main = "Turkey on 12 March 2009")

98


●●●●●

●

●●●

●●●●●●●

●●

●

●●

●

●

●●●●●●

●●●●●

●

●●●

●●●●●●●

●●

●

●●

●

●

●●●●●●

−98.385 −98.380 −98.375 −98.370 −98.365

27.910

27.915

27.920

27.925

Turkey on 12 March 2009

Longitude

Latit

ude

Ok, so, while this is all cool and fun, working with spatial data in R is a bit morecomplicated once you get past making a quick plot. For instance, lets create a minimumconvex polygon for these data and estimate how much area there is in the MCP boundariesusing areas from say a 5% MCP to a 100% MCP. Now, what is important to note, and hencethe reason for this example, is that not all R packages use and manipulate spatial data thesame way. For instance, here I am going to use adehabitatHR to create the MCPs, and Iam intentionally going to show you how easy it is to get the wrong answer if you don't pay

99


attention.So, �rst I am going to do this the quick way without paying much attention, by creating

a new dataframe from the turkey dataset, making it into class SpatialPoints, then runningit through the mcp routine in adehabitatHR without paying attention.

suppressPackageStartupMessages(library(adehabitatHR))

xy.mcp = cbind(Lon = newrawturkey$Lon, Lat = newrawturkey$Lat)

xy.mcpsp = SpatialPoints(xy.mcp)

mcp.out = mcp(xy.mcpsp, percent = 100)

plot(xy.mcp, xlab = "Longitude", ylab = "Latitude", las = 1, cex.axis = 0.7,

main = "Turkey 100% MCP")

plot(mcp.out, add = TRUE)

●●●●●●●

●

●

●

●●

●●●●●●●

●●●●●●●●●●●●●●

●

●●●

●●●●●●●●●

●

●●●

●

●●●●●●●●●● ●

●

●●●● ● ●

●

●

●●●●●●

●●●●●●

●●●●●●●●

●

●●

●●● ●

● ●● ●●●●●

●● ● ●●●●●●●●●●●●

●●

●●●●

●

● ●●

●

●

●●

●●●●●

●●

●

● ●●●●●●●●

●●●●●●

●●●

●●●●●●●●●●●

●●●●●●●●● ● ● ●●●●●●●●●

●●●●●●

●●●●●●●●●

●●●●● ●●●●●●

●

●●

●●●

●●●●●●●●●●●● ●●●●●●●●●●●●●● ● ●●●●

●●●●●●●●● ●

●●

●

●

●●●

●

●●●

●●●●

●● ●●

●●●●●●●●●●

●●●●●●

●●●●●●●● ● ●●●●●●●●●●●●●

●●

●●

●●●

●●●● ●●●

●●●●●●●●●●●●●●●●●●●

●

●●●

● ●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●● ●●● ●●

●● ● ●●

●●●●

●

●●●●●●●●●●●

●●●

●●●

●

●

● ●● ●

●●●●●●●●●●●

●●

●●●●●● ●

●● ● ●●●

●

●●●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●

●●

●●● ●●●

●●●

●●●●●

●●●●●●●●●●●●●●● ●

●●●●●

●● ●●●

●●●

●●●●●●●●●●●●●●●●●

●●●

●●●●● ● ●●●●●●●●●● ●

●●●

●●●

●●●●●●

●●● ● ●●●●●

●● ●● ●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

● ●●●● ●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●

●

●●●●●●●●●●●●●●●●● ●●●

●●●●●●●●●●●●●●● ●●●●●●

●●●●●●●●●●●●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●

● ●●● ●● ●●●●●●●●

●

●●

●●●

●●●

●●●● ● ●●●●●●●

●●

●

●●● ●●●

●●●

●●●●●●●●●●●

●●

●

●●●●

●●

●●●●●●●●

● ●●●

●●

●● ● ●● ● ●●●●

●●●●●●●●● ●●●●●

●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●

● ●●●● ●●●●●●●●●●●

●

●●

●●●●●●

●●●●●●●●●●●●●●● ●●

●●

● ●●●●●●●●●●●●●

●●●●●●●●

●●●

●●●●●●●●●●●●●●

●●● ● ●●● ●●●●●●●●●●●●

●●●●●●●●

●●●●●●●● ●●●●●●●●

●●

●●● ●●●●●●●●●●●●●

● ●●●●●●●●●●●●●

●●●●●●●●●●●●

●

●●

●● ●● ●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●

●

●●●● ●● ●●●●●●●●●●●●●●

●●●●●●● ●●●●●●●●●●

●●●●● ●●●●●●

●●●●

●●●●●●●●●●

●

●●●●●●●●● ●●●●●●●●●

●●

●●●●● ● ●●●●●●●●●●●

●●●●

●●●●

●

●

●

●

●●●● ●●●●●●●

●●

●●

●●●●●●●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●● ●

●●●● ● ●●

●●●●●●●●●●●●●

●●●●●●●●● ●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●

●

●●●●●

●●●●●●●●●●●●●●●● ●●●

●●●●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●● ●●

● ●●●

●●●●●●●●●●●●●●●

●●●●●

●

●

●●●●●●

●●●●●

●●●●●

●●●

●

●●●●●●●●●●●●●●●●

●

●●●●

●●●●●●

●●●●●●

●●●●●●●

●●

●●

●●

● ●●●●●●●●●●●●●

●●● ●●●●●●

●●●●

●

● ●● ●●●●●●● ● ●●●

●●●●●●●●●●●●●

●

●● ●●● ●●●●●●

●●●●●●●●●

●

●●●●● ● ●

●

●●●●●●●●●●●●

●

●

●

●

●●●●●●●●●●●●●●●●●●●●●●●

●●●

●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●

●

●

●●●●●●● ● ●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●

●

●● ●●●●●●●●●●●●●●●●●●

●●●●●●

●●●●● ●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●

●●●●●●●●● ●●●●●●

●●●

●●●

●

● ●●●

●●●●●●●●●●●●●● ● ●●●●● ●●●●●●●●●●●●●

●●●

●

●●●●●●●●●●●●●●●●●●●

●

●●●●●●●

●●●●●●●●

●

●●●●●● ●

●●●●●●●●●●●●●●●●●●

●

●●●●

●●●●

● ● ●

●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●● ●●

●●

●●●

●

●● ● ●●●●●

●

●

● ●●●●●●●●●●●●●●●●

●

●

●●●●●●●●●●●●●

●●

●●●

●●●●●●

● ●●●●●●●●●●●●●●●●

●●●●●●

●●●● ●

●●●●●●●●●● ● ●

● ●● ●●●●●●●●●●●●●●

●

●●●●●●●●●●

●●●●●●●●●●

● ●●●●●●

● ●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●

●●

●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●

●

●●●

●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●

●●●

●●

●●●

●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●

●●

●● ● ●

●●●●●●●●●●●●●●●●●●●

●●

● ●●● ●●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●

●●●●●●●●● ●●

●●● ●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●● ●●

●

●●●●●●●●●●●●●●●●● ●●●●

●●●●●●●●

●●●●● ●●●●● ● ●

● ●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●

●● ● ●● ●●

●● ●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●● ● ●● ●●● ●● ● ●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●● ●●●●●● ●●●●●●●●●●●●●●●●

●● ●●

●●●●●●●●●●●●●

● ● ●● ●● ●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●● ●●●●●●●●

●● ● ●●●●●●●●●●●●●●●●●●●

●

●●●●●●●

●●●●●●●●●

●●●●●

●

●

●●●●●●●●●●●●●●●●●●

●

●

●●●●●● ●●●

●●●●●

●●●●●

●●

●●

● ●●●●●●●●●●●●●●●●●●

●

●●●●●

●●●●●●

● ●●●●● ●

●

●

●● ● ●●●●●●●●●●●●●●●●●●●

●

●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●

●

●

●●●●●●●●●●●●

●● ●●●●●

●

●

●●●●●●●●●●●●●●●●●●●●

●

●●●●●

●●●●●●●●●●●●●

●

●

●●●●●●●●●●●●●●●●●●●●

●

●

●●●●●●●●●● ●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●

●●●●●●●

●●●●

●

● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●

●●●●●●

●●●●●●●●●

●●

●●●

●● ●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●

●

●

●●●

●

● ●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●

●●

● ●●●●●●●●●●●●●●●●●●●●

●●●●●

●●

●

●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●

● ●●●●●●●●●●●

●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●● ●

●●●●●●●●●● ●●●●●●

●

●●●●● ●●●●●●●●●●●●●●●●●●●●●

●●● ●●●●●●●●●●●●●●●

●

●●●● ●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●

●●

●●●●●●●●●●●●●●●

−98.39 −98.38 −98.37 −98.36 −98.35

27.89

27.90

27.91

27.92

27.93

Turkey 100% MCP

Longitude

Latit

ude

str(mcp.out)

## Formal class 'SpatialPolygonsDataFrame' [package "sp"] with 5 slots

## ..@ data :'data.frame': 1 obs. of 2 variables:

## .. ..$ id : Factor w/ 1 level "a": 1

## .. ..$ area: num 1.05e-07

## ..@ polygons :List of 1

## .. ..$ :Formal class 'Polygons' [package "sp"] with 5 slots

## .. .. .. ..@ Polygons :List of 1

## .. .. .. .. ..$ :Formal class 'Polygon' [package "sp"] with 5 slots

## .. .. .. .. .. .. ..@ labpt : num [1:2] -98.4 27.9

## .. .. .. .. .. .. ..@ area : num 0.00105

## .. .. .. .. .. .. ..@ hole : logi FALSE

100


## .. .. .. .. .. .. ..@ ringDir: int 1

## .. .. .. .. .. .. ..@ coords : num [1:9, 1:2] -98.4 -98.3 -98.4 -98.4 -98.4 ...

## .. .. .. .. .. .. .. ..- attr(*, "dimnames")=List of 2

## .. .. .. .. .. .. .. .. ..$ : chr [1:9] "963" "3208" "2191" "128" ...

## .. .. .. .. .. .. .. .. ..$ : chr [1:2] "Lon" "Lat"

## .. .. .. ..@ plotOrder: int 1

## .. .. .. ..@ labpt : num [1:2] -98.4 27.9

## .. .. .. ..@ ID : chr "a"

## .. .. .. ..@ area : num 0.00105

## ..@ plotOrder : int 1

## ..@ bbox : num [1:2, 1:2] -98.4 27.9 -98.3 27.9


## .. .. ..$ : chr [1:2] "x" "y"

## .. .. ..$ : chr [1:2] "min" "max"



mcp.area(xy.mcpsp, percent = seq(5, 100, by = 5), unin = c("m"), unout = c("ha"),

plotit = TRUE)

● ●● ● ● ● ● ●

●●

●

●

●

●●

●●

●

●

●

20 40 60 80 100

0e+

004e

−08

8e−

08

a

Home−range level

Hom

e−ra

nge

size

## a

## 5 5.343e-10

## 10 1.459e-09

## 15 2.741e-09

## 20 3.029e-09

## 25 3.063e-09

101


## 30 3.082e-09

## 35 3.138e-09

## 40 3.345e-09

## 45 5.348e-09

## 50 6.850e-09

## 55 1.241e-08

## 60 2.106e-08

## 65 2.920e-08

## 70 3.819e-08

## 75 4.073e-08

## 80 4.505e-08

## 85 4.700e-08

## 90 5.396e-08

## 95 5.827e-08

## 100 1.053e-07

So, what do we see? Well, �rst we get a MCP that looks reasonable, at 100% it en-compasses all the points like it should, but if you look at the area estimate show in thestr(mcp.out) it gives and estimate of MCP area of 0.0000001??? That cannot be right?Well, that's because adehabitatHR requires coordinates to be in UTM to estimate the MCP(like it says in the help �les). Not a problem, R has some pretty cool methods for projectcingspatial data, as well as transforming data between and amongst transformations. As anexample, lets see if we can get this right this time:

data(rawturkey)

x = rawturkey

x = na.omit(x)

x = x[x$Lon < -98.1, ]

coordinates(x) <- c("Lon", "Lat")

proj4string(x) <- CRS("+proj=longlat +zone=14 +datum=NAD83")

x = spTransform(x, CRS("+proj=utm +zone=14 +datum=NAD83"))

x@data = data.frame(x@data, ID = "Turkey 3057")

plot(x)

plot(mcp(x, percent = 100), add = TRUE)

102


suppressWarnings(mcp.area(x, percent = seq(5, 100, by = 5), unin = c("m"), unout = c("ha"),

plotit = TRUE))

● ● ● ● ● ● ● ●● ●

●

●

●

●●

● ●●

●

●

20 40 60 80 100

040

080

0

a

Home−range level

Hom

e−ra

nge

size

## a

## 5 5.897

## 10 15.385

## 15 26.473

## 20 31.087

## 25 31.741

## 30 32.055

103


## 35 32.857

## 40 33.452

## 45 55.697

## 50 71.828

## 55 137.912

## 60 230.350

## 65 309.643

## 70 405.773

## 75 439.740

## 80 496.709

## 85 514.768

## 90 556.796

## 95 637.771

## 100 1148.030

Ok, we can look at the points from a GPS unit, create a MCP, estimate area, cool.Next step, how about a kernel estimate of the utilization distribution?

out.kernel = kernelUD(x)

## Warning: xy should contain only one column (the id of the animals)

## id ignored

ud.out = getvolumeUD(out.kernel)

val = getverticeshr(ud.out, percent = 95)

plot(val)

104


kernel.area(out.kernel, percent = seq(5, 95, by = 5), unin = c("m"), unout = c("ha"))

## 5 10 15 20 25 30 35 40 45

## 3.074 6.149 9.223 15.371 21.520 27.668 36.891 49.188 61.485

## 50 55 60 65 70 75 80 85 90

## 79.931 98.376 122.970 150.639 184.455 224.421 273.609 332.020 421.173

## 95

## 587.183

Next step, lets have a look at a graphic relative to a simple Brownian bridge object (afancy kernel home range) based on the methods described in Horne et al. (2007) and Sawyeret al. (2009) so we can see exactly what we are creating from beginning to end. Using packageBBMM and doing a little data manipulation because BBMM requires UTMs and rawturkey isin lat/long, and because we have to de�ne time.lag= in brownian.bridge().

df = rawturkey

df = na.omit(df)

df$dt = as.POSIXct(strptime(paste(df$FixDate, df$FixTime), "%m/%d/%Y %H:%M:%S"))

df$tl = c(NA, (diff(df$dt)/60))

df = df[df$Lon < -98.1, ]

coordinates(df) = ~Lon + Lat

proj4string(df) = CRS("+proj=longlat +datum=WGS84")

df = data.frame(spTransform(df, CRS("+proj=utm +zone=14 +datum=WGS84")))

out = brownian.bridge(x = df$Lon, y = df$Lat, time.lag = df$tl[-1], location.error = 15,

cell.size = 20, max.lag = 180)

contours = bbmm.contour(out, levels = c(50, 95), locations = df, plot = TRUE)

X

Y

50%

95%

95%

560000 562000 5640003084

000

3087

000

3090

000

105


8 Literature Too Look At!

8.1 Here is a pretty short list of good books to get on your shelves.

� Modern Applied Statistics with S (Venables and Ripley)

� Regression Modeling Strategies (Harrell)

� Introductory Statistics with R (Dalgaard)

� Ecological Models and Data in R (Bolker)

� Data Manipulation with R (Spector)

� S Programming (Venables and Ripley)

� Mixed E�ects Models in S and S-Plus (Pinheiro and Bates)

� R Graphics (Murrell)

� Programming with Data (Chambers)

8.2 R packages that I use regularly and a few websites that will make your lifeeasier,

-not annotated, not complete but each is worth looking up/digging into

� reshape-for data manipulation

� sqldf-sql query language operations for R data frames

� plyr-for data manipulation

� knitr-for dynamic documentation creation

� http://cran.r-project.org/web/views/Spatial.html

� http://cran.r-project.org/web/views/Environmetrics.html

� http://cran.r-project.org/web/packages/emdbook/index.html

� mrds-all things distance sampling, conventional, multiple covariate, mark-recapture distance sampling,and some capture-recapture data.

� mrpt-point based mark-recapture distance sampling code from Laake et al. (2010).

� RMark-R interface to MARK

� marked-R mark-recapture models, no MARK interface, tred lightly, its not easy.

� DSpat-Spatial distance sampling

� adehabitatHR (and others)-basic modeling for spatial data

� sp-spatial statistics package (this is the grand daddy spatial package)

� rgdal-spatial statistics package (fantastic in my opinion)

� raster-spatial raster �le analysis

106