81
1 Vegetation Modeling

Vegetation Modeling

Embed Size (px)

DESCRIPTION

Vegetation Modeling. Outline. Model types Predictive models Predictor data Predictive model types (parametric/nonparametric) Model Example ( T ree-based/Random Forests) Modeling dataset Response/Predictor data Discussion – scale of predictors Predictor data extraction Data exploration - PowerPoint PPT Presentation

Citation preview

Page 1: Vegetation Modeling

1

Vegetation Modeling

Page 2: Vegetation Modeling

2

OutlineModel typesPredictive modelsPredictor dataPredictive model types (parametric/nonparametric)Model Example (Tree-based/Random Forests)

Modeling dataset• Response/Predictor data• Discussion – scale of predictors• Predictor data extraction

Data exploration• Summary statistics/NA values• attaching data frame• Predictor variables• Response variables (binary, continuous)

Model generation• Tree-based models• Random Forests (classification/regression trees)• Variable importance/proximity

Model prediction• Polygon data• Predictor data (clipping, stacking)• Apply model and display map

Modelmap• Build model• Model diagnostics• Make map

Page 3: Vegetation Modeling

3

## In general, there are 3 reasons we model vegetation:# 1. Explanatory: to explain why something is happening… find a pattern or a

cause.# 2. Descriptive: to see an association between variables, not aimed at prediction.# 3. Predictive: to predict an occurrence based on known attributes.

## Examples:Is the height of a tree related to its diameter?Did the last 20 years of drought affect tree mortality rates?Are forest disturbances changing the carbon cycle?Can we predict the distribution of vegetation 100 years from now based on

climate models?Can we predict the current distribution of vegetation across the landscape from

the spectral signature of remotely-sensed data?

Vegetation Modeling

## An article on different reasons for modeling.Shmueli, G. 2010. To Explain or to Predict? Statistical Science 25(3):289-310.

Page 4: Vegetation Modeling

4

## Predicting the distribution of vegetation across the landscape from available maps of environmental variables, such as geology, topography, and climate and spectral data from remotely-sensed imagery or products.

## Statistical models are built by finding relationships between vegetation data and the available digital data and then applied across given digital landscapes.

Predictive Vegetation Models

. .using flexible statistical models and GIS tools . .

. .and generate maps of forest attributes.

Integrate forest inventory data . .

. .with available digital data .

.y = f(x1 + x2 + xn)

• Satellite imagery• DEMs• Soils

Page 5: Vegetation Modeling

5

## Digital environmental data and remotely-sensed products are becoming increasingly more available at many different scales.

Remotely-sensed data and derived products• Snapshot of what is on the ground• Caution: represents current vegetation distributions based on reflectance patterns,

but does not explain potential occurrences of vegetation.

Topography variables (elevation, aspect, slope)• No direct physiological influence on vegetation• Caution: these variables are local to the model domain, be careful when

extrapolating over space or time.

Climate variables (temperature, precipitation)• Directly related to physiological responses of vegetation• Surrogates for topographical variables• Caution: these variables are better for extrapolating over space and time, but

recognize other limitations, such as current location and dispersal range, and species competition may also change through time..

Other surrogates (geology, soils, soil moisture availability, solar radiation)• Resources directly used by plants for growth• Caution: similar to climate variables

Predictor Data

Page 6: Vegetation Modeling

6

Model Objective:Find a relationship between vegetation data and predictor data for prediction purposes.

Predictive Model Types

The model, in its simplest form, looks like the following:

y ~ f (x1 + x2 + x3 + …) + ε,where y is a response variable, the x's are predictor variables, and ε is the associated error.

There are many different types of models

Parametric models – make assumptions about the underlying distribution of the data.

• Maximum likelihood classification• Discriminant analysis• General linear models (ex. linear regression)• . . .Nonparametric models – make no assumptions about the underlying data

distributions.• Generalized additive models• Machine-learning models• Artificial neural networks• . . .

Page 7: Vegetation Modeling

7

Predictive Model TypesParametric Models

Parametric models – make assumptions about the underlying distribution of the data

Assumptions:• The shape of the distribution of the underlying population is bell-shaped

(normal).• Errors are independent.• Errors are normally distributed with mean of 0 and a constant variance.

Advantages:• Easy to interpret• High power with low sample sizes

Disadvantages:• If sample data are not from a normally-distributed population, may lead to

incorrect conclusions.

Examples: • Maximum likelihood classification• Discriminant Analysis• Linear Regression• Multiple linear regression• Generalized linear models (parametric/nonparametric)

Page 8: Vegetation Modeling

8

Predictive Model TypesParametric Models – Regression Example

Previous example: Using regression to fill in missing values.

## Import data and subset for sp19 onlypath <- "C:/Peru/Rworkshop" # Where workshop material issetwd(path)tree <- read.csv("PlotData/tree.csv", stringsAsFactors=FALSE)sp19 <- tree[( tree$SPCD==19 & !is.na(tree$DIA) ),]

# Start off with a scatter plot par(mfrow=c(1,2))plot(sp19$DIA, sp19$HT, xlab ="Diameter", ylab ="Height",

main = "Subalpine Fir")abline(lm(sp19$HT ~ sp19$DIA))

# We saw some heteroscedasticity (unequal variance), and transformed data to log scale.plot(log(sp19$DIA), log(sp19$HT), xlab ="Diameter", ylab ="Height",

main = "Subalpine Fir, log scale")abline(lm(log(sp19$HT) ~ log(sp19$DIA)))

# We looked at summary of models and saw lower residual error and higher R2 values.r.mod <- lm(HT~DIA,data=sp19)summary(r.mod)

r.mod.log.ht.dia <- lm(log(HT)~log(DIA),data=sp19)summary(r.mod.log.ht.dia)

Page 9: Vegetation Modeling

9

Previous example cont..

# Then we looked at the residuals versus the fitted values and normal QQ-plots.par(mfrow=c(2,2))plot(r.mod$fitted,r.mod.log.ht.dia$residuals, xlab="Fitted",ylab="Residuals", main ="Fitted versus Residuals")abline(h=0)qqnorm(r.mod$residuals, main="Normal Q-Q Plot")qqline(r.mod$residuals)

plot(r.mod.log.ht.dia$fitted,r.mod.log.ht.dia$residuals, xlab="Fitted",ylab="Residuals", main ="Fitted versus Residuals, log scale")abline(h=0)qqnorm(r.mod.log.ht.dia$residuals, main="Normal Q-Q Plot, log scale")qqline(r.mod.log.ht.dia$residuals)

# NOTE: # Using transformations, we were able to work with data that was nonlinear, with unequal # variance structure using a parametric model.

Predictive Model TypesParametric Models – Regression Example

Page 10: Vegetation Modeling

10

Predictive Model TypesNonparametric Models

Nonparametric models – make no assumptions about the underlying distribution of the data.

Note: Vegetation data typically are not normally distributed across the landscape, therefore it is most often better to use a nonparametric model.

## Advantages:# If sample data are not from a normally-distributed population, using a parametric may lead to incorrect conclusions.

## Disadvantages:# Need larger sample sizes to have the same power as parametric statistics# Harder to interpret

# Can overfit data

## Examples: # Generalized additive models# Classification and regression trees (i.e. CART)# Artificial neural networks (ANN)# Multivariate adaptive regression splines (MARS)# Ensemble modeling (i.e. Random Forests)

Page 11: Vegetation Modeling

Random Forests (Breiman, 2001)

Generates a series of classification and regression tree models..

.. sampling, with replacement, from training data (bootstrap)

.. selecting predictor variables at random for each node

.. outputting the class that most frequently results

.. and calculating an out-of-bag error estimate

.. and measuring variable importance through permutationrandomForest – Liaw & Wiener

ModelMap – Freeman & Frescino

Random Forests

Page 12: Vegetation Modeling

12

Modeling Example

Page 13: Vegetation Modeling

Extract data from each layer at each sample plot location.

Prediction (% Tree crown cover)

Landsat TMElevationAspectSlope

Generate spatially explicit maps of forest attributes based on cell by cell predictions.

Model Example

Landsat TMElevationAspectSlope

60 TM B38000’ Elev160° Aspect15% Slope

120 TM B310500’ Elev10° Aspect12 % Slope

80 TM B39200’ Elev95° Aspect20% Slope

35 % cover

10 % cover

80 % cover

Objective:Using Random Forests to find relationships between forest inventory data and six spatial predictor layers. The models will be used to make predictions across a continuous, pixel-based surface.

Page 14: Vegetation Modeling

14

Model ExampleThe model form:

y ~ f (x1 + x2 + x3 + x4 + x5 + x6) + ε, where y is forest inventory data, and the x's are the predictor variables listed below, including satellite spectral data and topographic variables.

## For this example, # We will look at 2 responses:

# Presence of aspen Binary response of 0 and 1 values (1=presence)# Total carbon Continuous response

# With 5 predictor variables:# Landsat Thematic Mapper, band 5 30-m spectral data, band 5# Landsat Thematic Mapper, NDVI 30-m spectral data, NDVI# Classified forest/nonforest map 250-m classified MODIS,

resampled to 30 m# Elevation 30-m DEM# Slope 30-m DEM –derived# Aspect 30-m DEM - derived

Page 15: Vegetation Modeling

15

Utah, USA

Study Area

Page 16: Vegetation Modeling

16

## Vegetation5 different life zones:1. shrub-montane2. aspen3. lodgepole pine4. spruce-fir5. alpine

Model data set:Uinta Mountains, Utah,USA

Highest East-West oriented mountain range in the contiguous U.S. - up to 13,528 ft (4,123 m)

Study Area

Apply model to:High Uinta Wilderness

Page 17: Vegetation Modeling

17

Modeling Dataset

Page 18: Vegetation Modeling

18

Data for Modeling

# Load librarieslibrary(rgdal) # GDAL operations for spatial datalibrary(raster) # Analyzing gridded spatial datalibrary(rpart) # Recursive partitioning and regression treeslibrary(car) # For book (An R Companion to Applied Regression)library(randomForest) # Generates Random Forest modelslibrary(PresenceAbsence) # Evaluates results of presence-absence modelslibrary(ModelMap) # Generates and applies Random Forest models

Page 19: Vegetation Modeling

19

Response Data# Forest Inventory data (Model response)

# We have compiled this data before, let's review.options(scipen=6)plt <- read.csv("PlotData/plt.csv", stringsAsFactors=FALSE)tree <- read.csv("PlotData/tree.csv", stringsAsFactors=FALSE)ref <- read.csv("PlotData/ref_SPCD.csv", stringsAsFactors=FALSE)

## The plt table contains plot-level data, where there is 1 record (row) for each plot. This table has the coordinates (fuzzed) of each plot, which we will need later.

dim(plt) ## Total number of plotshead(plt) ## Display first six plot records.

## The tree table contains tree-level data, where there is 1 record (row) for each tree on a plot. We need to summarize the tree data to plot-level for modeling.

dim(tree) ## Total number of treeshead(tree) ## Display first six tree records.

# First, let's add the species names to the table.# Merge species names to table using a reference table of species codestree <- merge(tree, ref, by="SPCD")head(tree)

Page 20: Vegetation Modeling

20

Response Data# Forest Inventory data (Model response)

## We have 2 responses:# Presence of aspen Binary response of 0 and 1 values (1=presence)# Total carbon Continuous response

## Let's compile presence of aspen and append to plot table.

# First, create a table of counts by species and plot.spcnt <- table(tree[,c("PLT_CN", "SPNM")])head(spcnt)

# For this exercise, we don't care about how many trees per species, we just want presence

# or absence, therefore, we need to change all values greater than 1 to 1.spcnt[spcnt > 1] <- 1

# We are only interested in aspen presence, so let's join the aspen column to the plot table.

spcntdf <- data.frame(PLT_CN=row.names(spcnt), ASPEN=spcnt[,"aspen"])plt2 <- merge(plt, spcntdf, by.x="CN", by.y="PLT_CN")dim(plt)dim(plt2) # Notice there are fewer records (plots with no trees)

plt2 <- merge(plt, spcntdf, by.x="CN", by.y="PLT_CN", all.x=TRUE)dim(plt2)

Page 21: Vegetation Modeling

21

Response Data## Forest Inventory data (Model response)

## Now, let's compile total carbon by plot and append to plot table.

# First, create a table of counts by plot.pcarbon <- aggregate(tree$CARBON_AG, list(tree$PLT_CN), sum)names(pcarbon) <- c("PLT_CN", "CARBON_AG")

# Carbon is stored in FIA database with units of pounds. Let's add a new variable, CARBON_KG, with conversion to kg.

pcarbon$CARBON_KG <- round(pcarbon$CARBON_AG * 0.453592)

# Now we can join this column to the plot table (plt2).plt2 <- merge(plt2, pcarbon, by.x="CN", by.y="PLT_CN", all.x=TRUE)dim(plt2)head(plt2)

# Change NA values to 0 values.plt2[is.na(plt2[,"ASPEN"]), "ASPEN"] <- 0plt2[is.na(plt2[,"CARBON_KG"]), "CARBON_KG"] <- 0plt2$CARBON_AG <- NULLhead(plt2)

Page 22: Vegetation Modeling

22

Response Data# We need to extract data from spatial layers, so let's convert the plot table to a SpatialPoints object in R.

## We know the projection information, so we can add it to the SpatialPoints object. prj4str <- "+proj=longlat +ellps=GRS80 +datum=NAD83 +no_defs"ptshp <- SpatialPointsDataFrame(plt[,c("LON","LAT")], plt, proj4string = CRS(prj4str))

## Display the pointsplot(ptshp)

Page 23: Vegetation Modeling

23

Predictor Data## Predictor variables:

# Landsat Thematic Mapper, band 5 30-m spectral data, band 5, resampled to 90 m

# Landsat Thematic Mapper, NDVI 30-m spectral data, NDVI, resampled to 90 m

# Classified forest/nonforest map 250-m classified MODIS, resampled to 90 m

# Elevation 30-m DEM, resampled to 90 m# Slope 90-m DEM – derived in a

following slides # Aspect 90-m DEM – derived in a following slides

## Set file namesb5fn <- "SpatialData/uintaN_TMB5.img" # Landsat TM–Band5ndvifn <- "SpatialData/uintaN_TMndvi.img" # Landsat TM–NDVIfnffn <- "SpatialData/uintaN_fnfrcl.img" # Forest type map (reclassed)elevfn <- "SpatialData/uintaN_elevm.img" # Elevation (meters)

Note: If you don't have uintaN_fnfrcl.img, follow steps on last slide of this presentation (Appendix 1)

## Check rastersrastfnlst <- c(b5fn, ndvifn, fnffn, elevfn)rastfnlstsapply(rastfnlst, raster)

Page 24: Vegetation Modeling

24

# TM Band 5 (uintaN_TMB5.img)

# TM NDVI (uintaN_TMndvi.img)

Predictor Data

Page 25: Vegetation Modeling

25

# Forest/Nonforest map (uintaN_fnf.img)

# DEM (uintaN_elevm.img)

Predictor Data

Page 26: Vegetation Modeling

26

Predictor Data## Now, let's generate slope from DEM. Save it to your SpatialData folder.help(terrain)help(writeRaster)

slpfn <- "SpatialData/uintaN_slp.img" slope <- terrain(raster(elevfn), opt=c('slope'), unit='degrees', filename=slpfn, datatype='INT1U', overwrite=TRUE)

plot(slope, col=topo.colors(6))

# Add slope file name to rastfnlstrastfnlst <- c(rastfnlst, slpfn)rastfnlst

Page 27: Vegetation Modeling

27

Predictor Data## We can also generate aspect from DEM. Save it to your SpatialData folder.help(terrain)

## This is an intermediate step, so we are not going to save it.aspectr <- terrain(raster(elevfn), opt=c('aspect'), unit='radians')aspectr# Note: Make sure to use radians, not degrees

plot(aspectr, col=terrain.colors(6))

Page 28: Vegetation Modeling

28

## Aspect is a circular variable. There are a couple ways to deal with this:

## 1. Convert the values to a categorical variable (ex. North, South, West, East)## We derived aspect in radians. First convert radians to degrees.aspectd <- round(aspectr * 180/pi)aspectd

## Now, create a look-up table of reclass values.help(reclassify)frommat <- matrix(c(0,45, 45,135, 135,225, 225,315, 315,361), 5, 2)frommatfrommat <- matrix(c(0,45, 45,135, 135,225, 225,315, 315,361), 5, 2, byrow=TRUE)frommattovect <- c(1, 2, 3, 4, 1)

rclmat <- cbind(frommat, tovect)rclmat

## Reclassify raster to new values.aspcl <- reclassify(x=aspectd, rclmat, include.lowest=TRUE)aspclunique(aspcl)

bks <- c(0,sort(unique(aspcl))) # Break pointscols <- c("dark green", "wheat", "yellow", "blue") # Colorslabs <- c("North", "East", "South", "West") # Labelslab.pts <- bks[-1]-diff(bks)/2 # Label positionplot(aspcl, col=cols, axis.args=list(at=lab.pts, labels=labs), breaks=bks)

Predictor Data

Page 29: Vegetation Modeling

29

## 2. Convert to a linear variable (ex. solar radiation index; Roberts and Cooper 1989)

aspval <- (1 + cos(aspectr+30))/2 ## Roberts and Cooper 1989aspvalplot(aspval)

## Let's multiply by 100 and round so it will be an integer (less memory)aspval <- round(aspval * 100)aspvalplot(aspval)

# Save this layer to fileaspvalfn <- "SpatialData/uintaN_aspval.img"writeRaster(aspval, filename=aspvalfn, datatype='INT1U', overwrite=TRUE)

# Add aspval to rastfnlstrastfnlst <- c(rastfnlst, aspvalfn)

## Converts aspect into solar radiation equivalents, with a correction of 30 degrees to reflect ## the relative heat of the atmosphere at the time the peak radiation is received. ## Max value is 1.0, occurring at 30 degrees aspect; min value is 0, at 210 degrees aspect.

Predictor Data

## Roberts, D.W., and S. V. Cooper. 1989. Concepts and techniques in vegetation mapping. In Land classifications based on vegetation: applications for resource management. D. Ferguson, P. Morgan, and F. D. Johnson, editors. USDA Forest Service General Technical Report INT-257, Ogden, Utah, USA.

Page 30: Vegetation Modeling

Discussion - Scale of Predictors

## Tools in R for handling scale issues. See help on functions for further details.

# focal - applies a moving window function across pixels without changing the resolution.# Note: It works but takes a lot of time on large rasters.

# aggregate - aggregates pixels to lower resolution.# resample – resamples pixels to match extent or pixel size of another raster.

Page 31: Vegetation Modeling

Extract data from each layer at each sample plot location.

Landsat TMElevationAspectSlope

60 TM B38000’ Elev160° Aspect15% Slope

120 TM B310500’ Elev10° Aspect12 % Slope

80 TM B39200’ Elev95° Aspect20% Slope

35 % cover

10 % cover

80 % cover

## The next step is to extract the values of each raster at each sample point location.

Predictor Data Extraction

Page 32: Vegetation Modeling

32

## We need to check the projections of the rasters. If the projections are different,## reproject the points to the projection of the rasters, it is much faster.

## We will use the plt2 table with LON/LAT coordinates and the response data attached.head(plt2)

## We know the LON/LAT coordinates have the following projection:prj4str <- "+proj=longlat +ellps=GRS80 +datum=NAD83 +no_defs"

# Check projections of each raster.. sapply(rastfnlst, function(x){ projection(raster(x)) })

## Reproject SpatialPoints object to match raster projections. help(project)rast.prj <- projection(raster(rastfnlst[1]))xy <- cbind(plt$LON, plt$LAT)xyprj <- project(xy, proj=rast.prj)

Predictor Data Extraction

Page 33: Vegetation Modeling

33

## Extract values (raster package)help(extract)

# Let's extract values from 1 layer.tmp <- extract(raster(elevfn), xyprj)head(tmp)

# Now, let's create a function to extract, so we can extract from all the rasters at the same time.

extract.fun <- function(rast, xy){ extract(raster(rast), xy) }

# Now, apply this function to the vector list of raster file names.rastext <- sapply(rastfnlst, extract.fun, xyprj)

# Look at the output and check the class. head(rastext)class(rastext)

Predictor Data Extraction

Page 34: Vegetation Modeling

34

## Extract values (raster package) cont.. change names

# Let's make the column names shorter. colnames(rastext)

# Use the rastfnlst vector of file names to get new column names.# First, get the base name of each raster, without the extension.cnames <- unlist(strsplit(basename(rastfnlst), ".img"))cnames

# We could stop here, but let's make the names even shorter and remove# 'uintaN_' from each name.cnames2 <- substr(cnames, 8, nchar(cnames))cnames2

# Now, add names to matrix. Because the output is a matrix, we will use colnames.

colnames(rastext) <- cnames2head(rastext)

Predictor Data Extraction

Page 35: Vegetation Modeling

35

# Now, let's append this matrix to the plot table with the response data (plt2).head(plt2)

# We just want the response variables, so let's extract these columns along with the unique identifier of the table (CN, ASPEN, CARBON_KG).modeldat <- cbind(plt2[,c("CN", "ASPEN", "CARBON_KG")], rastext)head(modeldat)

# Check if this is a data frameis.data.frame(modeldat)dim(modeldat)

# Let's also append the projected xy coordinates for overlaying with raster layers.modeldat <- cbind(xyprj, modeldat)head(modeldat)colnames(modeldat)[1:2] <- c("X", "Y")head(modeldat)

Predictor Data Extraction

Page 36: Vegetation Modeling

36

Data Exploration

Page 37: Vegetation Modeling

37

## What to look for:

# NA values# Outliers# Correlations# Non-normal distributions# Changes in variability# Clustering# Non-linear data structures

Model Data Exploration

Page 38: Vegetation Modeling

38

Model Data ExplorationSummary statistics

## Summary statisticsstr(modeldat)summary(modeldat)head(modeldat)dim(modeldat)

# We need to convert categorical variables to factorsmodeldat$ASPEN <- factor(modeldat$ASPEN)modeldat$fnfrcl <- factor(modeldat$fnfrcl)

## Now, display summary statistics again and notice changes for ASPEN and fnfrcl

str(modeldat)summary(modeldat)head(modeldat) # notice head does not show which variables are factors

Page 39: Vegetation Modeling

39

Model Data ExplorationNA Values

## Check for NA values.modeldat[!complete.cases(modeldat),]

modeldat.NA <- modeldat[!complete.cases(modeldat),]dim(modeldat.NA)modeldat.NA

# We can overlay plots with NA values on raster.plot(raster(aspvalfn))points(modeldat.NA, pch=20)

# Most R model functions will handle or remove NA values, but for this example, let's remove the plots with NA values from our dataset now.

modeldat <- modeldat[complete.cases(modeldat),]dim(modeldat)

Page 40: Vegetation Modeling

40

Model Data Explorationattach data frame

## Attaching a data frame. This is useful if you are exclusively working with 1 data frame.

## Caution: data frame variable names must be unique to data frame.

## Let's save modeldat object and clean up before we attach the data frame. save(modeldat, file="Outfolder/modeldat.Rda")

# Now, remove all objects except modeldat.ls()[ls() != "modeldat"]rm(list = ls()[ls() != "modeldat"])ls()

# .. and attach modeldatattach(modeldat)head(modeldat)ASPEN # Display column vector without using $

# Notes:# To load the saved model object# load(file="Outfolder/modeldat.Rda") # Make sure to detach data frame when done using it..# detach(modeldat)

Page 41: Vegetation Modeling

41

Model Data ExplorationPredictors

## Check for outliers and correlations among predictors.

## Let's look at an example using 2 predictors (elevm, TMB5)preds <- cbind(elevm, TMB5)

## Correlation between predictor variables, to determine strength of relationship

cor(preds)round(cor(preds),4)

## Scatterplotsplot(preds)

## Add a regression lineabline(lm(TMB5 ~ elevm))

## Now let's add a smoother line for more information about data trendlines(loess.smooth(elevm, TMB5), col="red")

## Another waylibrary(car)scatterplot(TMB5 ~ elevm)help(scatterplot)

Page 42: Vegetation Modeling

42

Model Data ExplorationPredictors

## Correlation cont..# We can now see a non-linear trend in the data where, TMB5 spectral values

decrease as elevations rise up to around 3000 meters, then begins to increase in value as elevations continue to rise.

# This suggests a non-parametric relationship.

help(cor)

# The default uses pearson correlation coefficient, but spearman may be a better choice for nonlinear data structures.

round(cor(preds, method="spearman"),4)

## Let's look at all predictors at oncenames(modeldat)preds <- modeldat[-c(1:5)]head(preds)

## Correlation between predictor variables (to minimize number of predictors)

cor(preds)str(preds)

## Note Error : 'x' must be numeric (remove factor variable from analysis)round(cor(preds[,-3], method="spearman"),4)

Page 43: Vegetation Modeling

43

Model Data ExplorationPredictors

## Scatterplotspairs(preds)

help(pairs) #select Scatterplot Matrices from graphics package

## In help doc, scroll down to Examples and copy and paste the 2 functions:# panel.hist# panel.cor

pairs(preds, lower.panel = panel.smooth, upper.panel = panel.cor)

## Change cor function to spearman within panel.corpanel.cor <- function(x, y, digits = 2, prefix = "", cex.cor, ...) { usr <- par("usr"); on.exit(par(usr))

par(usr = c(0, 1, 0, 1)) r <- abs(cor(x, y, method="spearman")) txt <- format(c(r, 0.123456789), digits = digits)[1] txt <- paste0(prefix, txt) if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt) text(0.5, 0.5, txt, cex = cex.cor * r)

}pairs(preds, lower.panel = panel.smooth, upper.panel = panel.cor)

## Another wayscatterplotMatrix(preds[,-3])

Page 44: Vegetation Modeling

44

## Check for outliersplot(TMB5, TMndvi)identify(TMB5, TMndvi) # Click on outliers and press esc key to escape

## Display the outliersmodeldat[c(58,60),]plot(raster("SpatialData/uintaN_TMB5.img"))points(modeldat[c(58,60),], pch=20)

## Let's remove these outliers from our datasetmodeldat <- modeldat[-c(58,60),]dim(modeldat)length(TMB5)

## Detach and reattach data framedetach(modeldat)attach(modeldat)dim(modeldat)length(TMB5)save(modeldat, file="Outfolder/modeldat.Rda") ## Resave modeldat object# load(file="Outfolder/modeldat.Rda")

## Check scatterplot againplot(TMB5, TMndvi)

## ..and for all predictorspreds <- modeldat[-c(1:5)]pairs(preds, lower.panel = panel.smooth, upper.panel = panel.cor)

Model Data ExplorationPredictors

Page 45: Vegetation Modeling

45

## We can also look at the distribution of each predictor across its range of values and its diversion from normality using the cumulative density function we looked at earlier.

## This is only meaningful for continuous predictors

plot(sort(elevm))lines(c(1, length(elevm)), range(elevm), col=2)

## Now, let's make a little function to look at the rest of the continuous predictors

pdistn <- function(x){plot(sort(x)) lines(c(1, length(x)), range(x), col=2)

}par(ask=TRUE) # Press enter to go to next displaypdistn(elevm)pdistn(slp)pdistn(TMB5)pdistn(TMndvi)pdistn(aspval)par(ask=FALSE)

## For categorical predictors, let's use table to look at the distribution of samples by value.

table(fnfrcl)

Model Data ExplorationPredictors

Page 46: Vegetation Modeling

46

## Let's look at the distribution and amount of variability of the sample response data.

## Check for normality (bell-shaped curve)

## Histograms and density functionspar(mfrow=c(2,1))

hist(CARBON_KG)hist(CARBON_KG, breaks=5)

## Overlay density function (smoothed) on the histogram with continuous response.

hist(CARBON_KG, breaks=10, prob=TRUE)lines(density(CARBON_KG))

## Data with lots of 0 values tend to have this shape. This is not a normal distribution

## Therefore, using a nonparametric model, such as Random Forests is a good idea.

## Let's look at the distribution if there were no 0 values (plots without trees)hist(CARBON_KG[CARBON_KG>0], breaks=10, prob=TRUE)lines(density(CARBON_KG[CARBON_KG>0]))

## The shape of the distribution is similar, with much more higher values than lower values.

par(mfrow=c(1,1))

Model Data ExplorationResponse

Page 47: Vegetation Modeling

47

## Let's look at the distribution of the sample response data with a couple predictors.

## We will start with presence/absence of aspen. We know this is a binomial distribution with 0 and 1 values.

## Let's explore a little more.

## Plot elevation as a function of ASPEN, bar representing median value.boxplot(elevm ~ ASPEN, main="Aspen Presence/Absence")

## Add namesboxplot(elevm ~ ASPEN, main="Aspen Presence/Absence", names=c("Absence",

"Presence"))# Note: the bold line is median value

## Add points with mean valuesmeans <- tapply(elevm, ASPEN, mean)points(means, col="red", pch=18)

# Note: overall, presence of aspen tends to be at the lower elevations of the sample plots.

# The distributions are slightly skewed with the median elevm values higher than mean elevm values.

Model Data ExplorationData distributions – Binomial response

Page 48: Vegetation Modeling

48

## Other ways to explore relationships between the response and predictors.

## We can look at the differences of presence vs absence with 2 predictors.plot(elevm, ASPEN)

## We can look at the differences of presence vs absence with 2 predictors (elevation and slope).

par(mfrow=c(1,2))plot(elevm[ASPEN==1], slp[ASPEN==1]) plot(elevm[ASPEN==0], slp[ASPEN==0])

## Now, let's make it more meaningful by adding labels and using the same scale for each.

xlab <- "Elevation"ylab <- "Slope"xlim <- c(2000, 4000)ylim <- c(0,40)plot(elevm[ASPEN==1], slp[ASPEN==1], xlab=xlab, ylab=ylab, xlim=xlim, ylim=ylim,

main="Aspen Present") plot(elevm[ASPEN==0], slp[ASPEN==0], xlab=xlab, ylab=ylab, xlim=xlim, ylim=ylim,

main="No Aspen")

par(mfrow=c(1,1))

Model Data ExplorationData distributions – Binomial response

Page 49: Vegetation Modeling

49

## Other ways to explore relationships between the response and predictors (1 plot)

## We can also color the points based on a factor (ASPEN)plot(slp, elevm, col=ASPEN, pch=20, xlab="Slope", ylab="Elevation")

## Add a legend (using default colors)legend(x=35, y=2200, legend=levels(ASPEN), col=1:length(ASPEN), pch=20)help(legend)

## Now, do it again using your own color choicepalette(c("blue", "green")) # Change color palette firstplot(slp, elevm, col=ASPEN, pch=20, xlab="Slope", ylab="Elevation")legend(x=35, y=2200, legend=levels(ASPEN), col=1:length(ASPEN), pch=20)palette("default") # Change color palette back to default colors

## Or..plot(slp, elevm, col=c("red", "blue"), pch=20, xlab="Slope", ylab="Elevation")legend(x=35, y=2200, legend=levels(ASPEN), col=c("red", "blue"), pch=20)

## Another way:scatterplot(slp ~ elevm|ASPEN, data=modeldat)

## For categorical predictors, use table function againtable(ASPEN, fnfrcl)

Model Data ExplorationData distributions – Binomial response cont.

Page 50: Vegetation Modeling

50

## Other ways to explore relationships between the response and predictors.

## We can look at relationship between elevation and CARBON_KGplot(elevm, CARBON_KG, xlab="Elevation", ylab="Carbon(kg)")

## Without 0 valuesplot(elevm[CARBON_KG>0], CARBON_KG[CARBON_KG>0], xlab="Elevation",

ylab="Carbon (kg)")

## Add regression and smoother lines.par(mfrow=c(2,1))

## With 0 valuesplot(elevm, CARBON_KG, xlab="Elevation", ylab="Carbon(kg)")line.lm <- lm(CARBON_KG ~ elevm)abline(line.lm, col="red")line.sm <- lowess(elevm, CARBON_KG)lines(line.sm, col="blue")

## Without 0 valuesplot(elevm[CARBON_KG>0], CARBON_KG[CARBON_KG>0],xlab="Elevation",ylab="Carbon(kg)")line.lm.w0 <- lm(CARBON_KG[CARBON_KG>0] ~ elevm[CARBON_KG>0])abline(line.lm.w0, col="red")line.sm <- lowess(elevm[CARBON_KG>0], CARBON_KG[CARBON_KG>0])lines(line.sm, col="blue")

par(mfrow=c(1,1))

Model Data ExplorationData distributions – Continuous response

Page 51: Vegetation Modeling

51

Model Generation

Page 52: Vegetation Modeling

Extract data from each layer at each sample plot location.

y ~ f (x1 + x2 + x3 + x4 + x5 + x6) + ε

Landsat TMElevationAspectSlope

60 TM B38000’ Elev160° Aspect15% Slope

120 TM B310500’ Elev10° Aspect12 % Slope

80 TM B39200’ Elev95° Aspect20% Slope

35 % cover

10 % cover

80 % cover

Model Generation

Page 53: Vegetation Modeling

53

## Strengths of Tree-based models

# Easy to interpret# Adaptable for handling missing values# Handles correlated predictor variables# Predictor variable interactions are automatically included# Handles categorical or continuous predictor variables

## Weaknesses of Tree-based models, such as Random Forests

# Optimization is based on each split, not necessary the overall tree model# Continuous predictors are treated as categorical, thus inefficient# Nonparametric, thus loses some the power of parametric statistics# Tendency to overfit data

Tree-based Models

Page 54: Vegetation Modeling

54

## What does this mean, in simple form..

# Using our example dataset,load("Outfolder/modeldat.Rda")

library(rpart)

## Classification treeasp.tree <- rpart(ASPEN ~ TMB5 + TMndvi + fnfrcl + elevm + slp + aspval, data=modeldat, method="class")

plot(asp.tree)text(asp.tree, cex=0.75)

## Regression treecarb.tree <- rpart(CARBON_KG ~ TMB5 + TMndvi + fnfrcl + elevm + slp + aspval,

data=modeldat)plot(carb.tree)text(carb.tree, cex=0.75)

Tree-based Models

Page 55: Vegetation Modeling

Random Forests (Breiman, 2001)

Generates a series of classification and regression tree models..

.. sampling, with replacement, from training data (bootstrap)

.. selecting predictor variables at random for each node

.. outputting the class that most frequently results

.. and calculating an out-of-bag error estimate

.. and measuring variable importance through permutationrandomForest – Liaw & Wiener

ModelMap – Freeman & Frescino

Random Forests

Breiman, L. (2001). Random forests. Machine Learning J. 45 5- 32.Liaw, A.; Wiener, M. 2002. Classification and Regression by randomForest. ISSN 2/3:18-22.Freeman, E.A. et al. 2012. ModelMap: an R package for Model Creation and Map Production. CRAN R vignette.

Page 56: Vegetation Modeling

56

## Strengths Random Forests (Breiman 2001)

# Bootstrap sample – A random selection of plots used to construct one tree.# Boosting – Successive trees are constructed using information from previous trees, and a weighted vote is used for prediction.# Bagging – Each tree is independently constructed, where the majority (or average) vote is used for prediction. Breiman's Random Forests uses bagging.# Predictor selection - Each node is split using a randomly selected subset of predictors. In standard trees, all variables are used. This difference is more robust against overfitting.

# Two main parameters # 1. Number of trees (bootstrap samples) to generate (ntree)# 2. Number of variables in the random subset of predictors at each node

(mtry)

# Predictions – Aggregate predictions of n trees # For categorical response (Classification trees) – majority votes# For continuous response (Regression trees) – average of regression

# Error rate – based on training data ('out-of-bag', or OOB)# For each tree (bootstrap sample)# For continuous response (Regression trees) – average of regression

Random Forests Model

Page 57: Vegetation Modeling

57

## Now, let's use the randomForests package – Classification tree

library(randomForest)help(randomForest)

## Let's try with ASPEN binary, categorical response (presence/absence)set.seed(66)asp.mod <- randomForest(ASPEN ~ TMB5 + TMndvi + fnfrcl + elevm + slp + aspval,

data=modeldat, importance = TRUE)

## Default parameters:# ntree = 500 # Number of trees# mtry = sqrt(p) # Number of predictors (p) randomly sampled at each

node# nodesize = 1 # Minimum size of terminal nodes# replace = TRUE # Bootstrap samples are selected with replacement

## Look at resultsasp.modsummary(asp.mod)names(asp.mod)

Random Forests Modelclassification tree

Page 58: Vegetation Modeling

58

## Classification tree - Outputnames(asp.mod)err <- asp.mod$err.rate # Out-Of-Bag (OOB) error rate (of i-th element)head(err)tail(err)

mat <- asp.mod$confusion # Confusion matrixmat

Random Forests ModelClassification tree

Page 59: Vegetation Modeling

59

## Classification tree - Output

# Plot the number of trees by the error rateplot(1:500, err[,"OOB"], xlab="Number of trees", ylab="Error rate")

# Note: how many trees needed to stabilize prediction

## Calculate the percent correctly classified from confusion (error) matrixmatpcc <- sum(diag(mat[,1:2]))/sum(mat) * 100pccpcc <- round(pcc, 2) ## Round to nearest 2 decimalspcc

library(PresenceAbsence)pcc(mat[,1:2], st.dev=TRUE)

Kappa(mat[,1:2], st.dev=TRUE)

## The Kappa statistic summarizes all the available information in the confusion matrix.

## Kappa measures the proportion of correctly classified units after accounting for the probability of chance agreement.

Random Forests ModelClassification tree

Page 60: Vegetation Modeling

60

## Now, let's use the randomForests package – regression tree

## Now, let's try with the continuous, CARBON_KG responseset.seed(66)carb.mod <- randomForest(CARBON_KG ~ TMB5 + TMndvi + elevm + slp + aspval, data=modeldat, importance = TRUE)

## Default parameters:# ntree = 500 # Number of trees# mtry = p/3 # Number of predictors (p) randomly sampled at each node# nodesize = 5 # Minimum size of terminal nodes# replace = TRUE # Bootstrap samples are selected with replacement

## Look at resultscarb.modsummary(carb.mod)names(carb.mod)

Random Forests Model

Page 61: Vegetation Modeling

61

## Regression tree - Outputnames(carb.mod)mse <- carb.mod$mse # Mean square error (of i-th element) rsq <- carb.mod$rsq # Pseudo R-squared (1-mse/Var(y))(of i-th element)

head(mse)tail(mse)tail(rsq)

Random Forests ModelRegression tree

Page 62: Vegetation Modeling

62

## Regression tree - Output

# Plot the number of trees by the mse (Mean Square Error)plot(1:500, mse, xlab="Number of trees", ylab="Mean Square Error rate")

# Note: how many trees needed to stabilize prediction

# Similarly, plot the number of trees by the rsq (R-Squared)plot(1:500, mse, xlab="Number of trees", ylab="R-Squared")

# Again: how many trees needed to stabilize prediction

Random Forests ModelRegression tree

Page 63: Vegetation Modeling

63

## Other information from RandomForest model (importance=TRUE)

# Variable importance (Breiman 2002)# Estimated by how much the prediction error increases when the OOB data for

that variable is permuted while all others are left unchanged.

# randomForests computes different measures of variable importance

# 1. Computed from OOB data, averaged over all trees and normalized by the standard deviation of the differences.

Classification trees – error rate (Mean Decrease Accuracy)Regression trees – Mean Square Error (%IncMSE)

# 2. The total decrease in node impurities from splitting on the variable, averaged over all trees.

Classification trees – measured by Gini index (Mean Decrease Gini)Regression trees – measured by residual sum of squares (IncNodePurity)

Random Forests ModelVariable Importance

Page 64: Vegetation Modeling

64

## Variable importance – Classification tree

## Get importance tableasp.imp <- abs(asp.mod$importance)asp.imp

## Get the number of measures (columns) and number of predictorsncols <- ncol(asp.imp) ## Number of measuresnumpred <- nrow(asp.imp) ## Get number of predictors

## Plot the measures of variable importance for ASPEN presence/absencepar(mfrow=c(2,2))for(i in 1:ncols){ ## Loop thru the different importance measuresivect <- sort(asp.imp[,i], dec=TRUE) ## Get 1st measure, descending orderiname <- colnames(asp.imp)[i] ## Get name of measure

# Generate histogram plot (type='h') with no x axis (xaxt='n')plot(ivect, type = "h", main = paste("Measure", iname), xaxt="n",

xlab = "Predictors", ylab = "", ylim=c(0,max(ivect)))

# Add x axis with associated labelsaxis(1, at=1:numpred, lab=names(ivect))

}

Random Forests ModelVariable Importance - Classification

Page 65: Vegetation Modeling

65

## Let’s make a function and plot importance values for CARBON_KG model.

plotimp <- function(itab){ncols <- ncol(itab) ## Number of measuresnumpred <- nrow(itab) ## Get number of predictors

## Plot the measures of variable importance par(mfrow = c(ncols/2,2))for(i in 1:ncols){ ## Loop thru the different importance measures

ivect <- sort(itab[,i], dec=TRUE) ## Get 1st measure, sorted decreasing

iname <- colnames(itab)[i] ## Get name of measure

# Generate histogram plot (type='h') with no x axis (xaxt='n')plot(ivect, type = "h", main = paste("Measure", iname), xaxt="n",

xlab = "Predictors", ylab = "", ylim=c(0,max(ivect)))

# Add x axis with associated labelsaxis(1, at=1:numpred, lab=names(ivect)) }

}

## Check function with ASPEN modelplotimp(asp.imp)

## Now, run funtion with CARBON_KG modelplotimp(carb.mod$importance)

Random Forests ModelVariable Importance - Regression

Page 66: Vegetation Modeling

66

## Other information from RandomForest model (proximity=TRUE)

# Measure of internal structure (Proximity measure)# - The fraction of trees in which each plot falls in the same terminal node.

# - Similarity measure - in theory, similar data points will end up in the same terminal node.

## Let's try adding proximity to CARBON_KG modelset.seed(66)carb.mod <- randomForest(CARBON_KG ~ TMB5 + TMndvi + elevm + slp + aspval, data=modeldat, importance = TRUE, proximity = TRUE)

names(carb.mod)carb.prox <- carb.mod$proximity

head(carb.prox)

Random Forests ModelProximity

Page 67: Vegetation Modeling

67

Model Prediction

Page 68: Vegetation Modeling

68

## Vegetation5 different life zones:1. shrub-montane2. aspen3. lodgepole pine4. spruce-fir5. alpine

Model data set:Uinta Mountains, Utah,USA

Highest East-West oriented mountain range in the contiguous U.S. - up to 13,528 ft (4,123 m)

Study Area

Apply model to:High Uinta Wilderness

Page 69: Vegetation Modeling

69

Polygon Data## Let's import and display the 2 polygon layers as well.

## Set dsn and polygon layer namesdsn <- "SpatialData"aoinm <- "uintaN_aoi" # AOI boundarywildnm <- "uintaN_wild" # Wilderness boundary (mapping AOI)

## Import polygon shapefilesbndpoly <- readOGR(dsn=dsn, layer=aoinm, stringsAsFactors=FALSE)wildpoly <- readOGR(dsn=dsn, layer=wildnm, stringsAsFactors=FALSE)

## Check projections of all 3 layers to see if we can display them together.sapply(c(bndpoly, wildpoly), projection)

## Now we can display all 3 layerspar(mfrow=c(1,1))plot(bndpoly, border="black", lwd=3)plot(wildpoly, add=TRUE, border="red",

lwd=2)

Page 70: Vegetation Modeling

70

Predictor Data## Now, we need to clip the raster predictor layers to the extent of the wilderness polygon

## Set file namesb5fn <- "SpatialData/uintaN_TMB5.img" # Landsat TM–Band5ndvifn <- "SpatialData/uintaN_TMndvi.img" # Landsat TM–NDVIfnffn <- "SpatialData/uintaN_fnfrcl.img" # Forest type mapelevfn <- "SpatialData/uintaN_elevm.img" # Elevation (meters)slpfn <- "SpatialData/uintaN_slp.img" # Derived slope (degrees)aspfn <- "SpatialData/uintaN_aspval.img" # Derived aspect value

## Check rastersrastfnlst <- c(b5fn, ndvifn, fnffn, elevfn, slpfn, aspfn)sapply(rastfnlst, raster)

## Compare projections of rasters with projection of wilderness polygonprojection(wildpoly)

Page 71: Vegetation Modeling

71

Predictor DataClip layers

## Clip raster layers

## Let's clip the elevm raster using crop functionhelp(crop)elevclip <- crop(raster(elevfn), extent(wildpoly))

## Now display the new raster. plot(elevclip)

## Note: Notice it didn't clip to boundary, it clipped to extent of boundary## Add polygon layer to displayplot(wildpoly, add=TRUE)

## We need to crop further by applying a mask with the polygon layer.elevclip <- mask(elevclip, wildpoly)plot(elevclip)

Page 72: Vegetation Modeling

72

Predictor DataClip layers

## Create a function to clip the all the layers, saving to the working directory. Use rastfnlst for raster names and build new names from the list. Return the new name of the raster.cliprast <- function(rastfn, poly){

## Create new name from rastfnrastname <- strsplit(basename(rastfn), ".img")[[1]]newname <- substr(rastname, 8, nchar(rastname))newname <- paste("Outfolder/",newname, "_clip.img", sep="")

## Crop rasterrastclip <- crop(raster(rastfn), extent(poly))

## Mask raster and save to working directory with newnamerastclip <- mask(rastclip, poly, filename=newname, overwrite=TRUE)print(paste("finished clipping",rastname))flush.console()return(newname)

}

clipfnlst <- {}for(rastfn in rastfnlst){

clipfn <- cliprast(rastfn, wildpoly)clipfnlst <- c(clipfnlst, clipfn)

}

Page 73: Vegetation Modeling

73

Predictor DataCreate Raster Stack

## Check clipped rasters.sapply(clipfnlst, raster)

## For prediction, the predictor layers must be consistent. Check the following:# dimensions – all layers must have the same number of rows and columns# resolution – all layers must have the same resolution# extent – all layers must have the same extent# projection – all layers must have same projection

## Create a stack of all the predictor layersclipstack <- stack(clipfnlst)clipstack

## Add names to stack layersstacknames <- unlist(strsplit(basename(clipfnlst), "_clip.img"))names(clipstack) <- stacknamesclipstack

Page 74: Vegetation Modeling

Extract data from each layer at each sample plot location.

y ~ f (x1 + x2 + x3 + x4 + x5 + x6) + ε

Prediction (% Tree crown cover)

Landsat TMElevationAspectSlope

Generate spatially explicit maps of forest attributes based on cell by cell predictions.

Landsat TMElevationAspectSlope

60 TM B38000’ Elev160° Aspect15% Slope

120 TM B310500’ Elev10° Aspect12 % Slope

80 TM B39200’ Elev95° Aspect20% Slope

35 % cover

10 % cover

80 % cover

Apply Model

Page 75: Vegetation Modeling

75

## Predict across stack pixels.asp.predict <- predict(clipstack, asp.mod)asp.predictplot(asp.predict)

# Plot with color breakscols <- c("dark grey", "green")plot(asp.predict, col=cols, breaks=c(0,0.5,1))colors()

# Or, a little fancier.. create function to color categorical raster classescolclasses <- function(rast, cols, labs){

nc <- length(cols) # Number of classesminval <- cellStats(rast, 'min') # minimum value of rastermaxval <- cellStats(rast, 'max') # maximum value of rasterbks <- seq(minval,maxval,length.out=nc+1) # break points

lab.pts <- bks[-1]-diff(bks)/2 # label pointsplot(rast, col=cols, axis.args=list(at=lab.pts, labels=labs),

breaks=bks)}

colclasses(asp.predict, cols, labs=c("Absence", "Presence"))par()help(par)par(omi=c(0,0,0,0.5)) # Changes outside marginscolclasses(asp.predict, cols, labs=c("Absence", "Presence"))

Apply Model & Display MapASPEN

Page 76: Vegetation Modeling

76

## Predict across stack pixels.carb.predict <- predict(clipstack, carb.mod)

## Plot with heat color rampplot(carb.predict, heat.colors(10))

## Plot with grey scaleplot(carb.predict, col=grey(256:0/256))

## Plot with green scalemy.colors <- colorRampPalette(c("white", "dark green"))plot(carb.predict, col=my.colors(10))

Apply Model & Display MapCARBON_KG

Page 77: Vegetation Modeling

library(ModelMap)

help(package=ModelMap)

predList <- stacknames#predList <- c("elevm", "TMndvi")

## Build random forest modelasp.mod2 <- model.build(

model.type = "RF", qdata.trainfn = modeldat, folder = "Outfolder", predList = predList,predFactor = "fnfrcl", response.name = "ASPEN", response.type = "categorical", unique.rowname = "CN", seed = 66)

asp.mod2

ModelMapBuild Model

Page 78: Vegetation Modeling

asp.mod2d <- model.diagnostics(model.obj = asp.mod2,

qdata.trainfn = modeldat, folder = "Outfolder",

response.name = "ASPEN", prediction.type = "OOB",

unique.rowname = "CN")

ModelMapModel diagnostics

Page 79: Vegetation Modeling

# Add full path to beginning of each name in predListpredPath <- sapply(predList,

function(x){paste("SpatialData/uintaN_", x, ".img", sep="")})predPath

# Generate rastLUTfn. See help(model.mapmake) for details.numpreds <- length(predList)rastLUTfn <- data.frame(matrix(c(predPath, predList, rep(1,numpreds)),

numpreds, 3), stringsAsFactors=FALSE)rastLUTfn

## Check rasterssapply(rastfnlst, raster)

## Make Mapa <- model.mapmake(

model.obj = asp.mod2, folder = "Outfolder", rastLUTfn = rastLUTfn, make.img = TRUE, na.action = "na.omit")

ModelMapMake Map

Page 80: Vegetation Modeling

## 1. Create a map of presence of lodgepole within the Uinta Wilderness area, using a Random Forests model and the predictors from modeldat. Hint: Load the plt and tree table again and use code from slide #20 to create a binary response variable for lodgepole presence. Also, set a seed to 66 so you can recreate the model.

## 2. How many plots in model data set have presence of lodgepole pine?

## 3. Display 1 scatterplot of the relationship of elevation and NDVI values with lodgepole pine presence and absence as different colors. Hint: see slide #49. Make sure to label plots and add a legend.

## 4. What is the variable with the highest importance value based on the Gini index?

## 5. What percentage of the total area is lodgepole pine? How does this compare with the percentage of aspen?

Exercise

Page 81: Vegetation Modeling

## Reclass raster layer to 2 categories

fnf <- raster("SpatialData/uintaN_fnf.img")

## Create raster look-up tablefromvect <- c(0,1,2,3)tovect <- c(2,1,2,2)rclmat <- matrix(c(fromvect, tovect), 4, 2)

## Generate raster and save to SpatialData folderfnfrcl <- reclassify(x=fnf, rclmat, datatype='INT2U',filename="SpatialData/uintaN_fnfrcl.img", overwrite=TRUE)

Appendix 1