25
UNDERSTANDING AND PREDICTING WILDFIRES IN OREGON Sajid Saleem, Sean Polling and Robert Krebs BUS519 Business Analytics

Wildfire

Embed Size (px)

Citation preview

Page 1: Wildfire

UNDERSTANDING AND PREDICTING WILDFIRES IN OREGON

Sajid Saleem, Sean Polling and Robert Krebs

BUS519 Business Analytics

Page 2: Wildfire

CONTENTS Executive Summary ................................................................................................................................................................................... 3

Problem Description .................................................................................................................................................................................. 3

Data and Model Assumptions ................................................................................................................................................................ 3

Review of Prior Work ................................................................................................................................................................................ 3

Understanding Wildfires ............................................................................................................................................................................ 4

Correlation of Temperature and Total Acres Burned ............................................................................................................................. 4

Correlation of Wind Speed and Total Acres Burned ............................................................................................................................. 5

Correlation of Precipitation and Total Acres Burned ............................................................................................................................. 5

Correlation of Humidity and Total Acres Burned .................................................................................................................................. 6

Correlation of Population and Total Acres Burned ................................................................................................................................ 6

General Causes for Wildfires ................................................................................................................................................................. 7

Fuel Model ............................................................................................................................................................................................. 8

Wildfires by County ............................................................................................................................................................................... 8

The Dashboard ....................................................................................................................................................................................... 8

Predicting Wildfire Starts .......................................................................................................................................................................... 9

Data Transformations ............................................................................................................................................................................. 9

Model Description ................................................................................................................................................................................. 9

Model Tuning ...................................................................................................................................................................................... 10

Model Evaluation ................................................................................................................................................................................. 10

Alternative Methods ............................................................................................................................................................................. 11

Predicting Wildfire Impact ...................................................................................................................................................................... 11

Model Development ............................................................................................................................................................................. 11

Model Evaluation ................................................................................................................................................................................. 12

Conclusions .............................................................................................................................................................................................. 13

Appendix .................................................................................................................................................................................................. 14

Data Preparation................................................................................................................................................................................... 14

Correlation Matrix ............................................................................................................................................................................... 18

Logistic Regression .............................................................................................................................................................................. 18

Classification Tree ............................................................................................................................................................................... 20

Multiple Linear Regression .................................................................................................................................................................. 22

Sources ..................................................................................................................................................................................................... 25

Data Sources ........................................................................................................................................................................................ 25

References ............................................................................................................................................................................................ 25

Page 3: Wildfire

EXECUTIVE SUMMARY

Two hundred million dollars were spent per week during the summer months of 2015 to battle wildfires across the country. In Oregon

and Washington alone three firefighters lost their lives while many homes were destroyed. The most effective methods of detecting

wildfires can be expensive and are not accessible to rural areas where wildfires are more common. This paper uses descriptive

analytics to better understand wildfire behavior in Oregon and investigates the use of predictive analytics paired with inexpensive and

accessible data sources to create models to predict both wildfire ignition and spread.

In this analysis we determine that most wildfires occur in the summer months of Oregon and the most common cause is lightning. We

also demonstrate that higher temperatures, lower wind speeds, thunderstorms, and fog are predictors for wildfire ignition. Wildfire

spread is most impacted by lower humidity, higher wind speeds, fog, and certain fuel models (vegetation type).

PROBLEM DESCRIPTION

Wildfires, also known as forest fires, can move in excess of 15 miles per hour and destroy several acres of land in just a few minutes.

In the US there are about 100,000 fires each year, burning up to five million acres of land. With fires burning hotter and people

settling closer to fire prone areas each year, wildfire control continues to be an important issue.

Traditional methods of surveillance are not always effective at detecting fires early enough. Automated solutions, such as satellite-

based surveillance, infrared scanners and local sensors, can help improve early fire detection but suffer from two significant flaws,

high cost and not being predictive.

Using data mining tools can help. Weather data such as air humidity, wind speed, and temperature are relatively easy to collect in real

time and at low cost. These conditions are known to impact the occurrence of wildfires. The intent of this project is to look at the

observed behavior of wildfires in Oregon. We will build predictive to classify and predict the occurrence and impact of wildfires in

Oregon.

DATA AND MODEL ASSUMPTIONS

Use data that is readily available or cheap to collect. The model should rely on data that is readily available at low cost.

The wildfire data used is only land under the jurisdiction of the Oregon Department of Forestry (ODF). This includes 16

million acres of land in Oregon.

Restrict the data and analysis to the State of Oregon unless more data is required to meet data mining algorithm requirements

and best practices.

REVIEW OF PRIOR WORK

Predicting the outbreak and behavior of a wildfire has always been a challenging task for firefighters since many of the independent

variables in the modern, sophisticated prediction models are forecasts themselves. The weather models, the fire behavior models and

the interaction of both are often based on forecasts. Nevertheless it is vital to improve the existing models since wildfires are an

ongoing threat to human lives and property. An article on www.scientificamerican.com (Massay, Nathanel; November 15, 2013) talks

about the development of a new technique to predict wildfires, using high-resolution satellite imagery to periodically check and revise

computer simulations. The Suomi NPP, a weather satellite, uses a special infrared sensor capable of taking high resolution images of

Page 4: Wildfire

the Earth’s surface. The satellite moves over the same spot on the Earth every 12 hours, feeding the data into a model that has been

developed to identify signature patterns for wildfires.

Paul Cortez and Anibal Morales of the Department of Information Systems/R&D Algoritmi Centre at the University of Minho in

Portugal state that these advanced fire prediction systems are not accessible to every firefighting department, and have high costs.

They therefore proposed a data mining approach, using meteorological data, to predict the burned area of forest fires. The authors used

five different data mining techniques, including Support Vector Machines (SVM), Radom Forests and four feature selection setups

(using Fire Weather Index data and meteorological data), to predict burned areas caused by small fires, which are occurring more

frequently. Their final proposed solution was derived by a regression analysis which only required the input of four direct weather

inputs as well as the SVM approach. The major drawback of their analysis is that only small fires can be predicted with sufficient

accuracy. Another issue is that input variables like firefighting invention time and the vegetation of relevant area have been neglected.

The authors also admit that direct weather condition data would be preferable over accumulated historical data to feed the models.

UNDERSTANDING WILDFIRES

The first step we took to trying to confirm our understanding of what weather data affects the ignition and spread of wildfire is to load

both the fire data and weather data into Tableau. We started by looking for correlations between the various input variables and the

output variable (total acres burned). We used scatter plots to examine variables we expected to have strong relationships and zoomed

into the graphs from 0 to 1000 acres burned because there are some very high outliers around 14,000 acres burned that skew the graph.

CORRELATION OF TEMPERATURE AND TOTAL ACRES BURNED

The scatterplots show a relationship between daily temperature in Fahrenheit and the total area burned on a particular day. The most

interesting observation was that the maximum daily has the strongest relationship to total acres burned. The trend line linear regression

equation explained that the total acres burned increase by 2.88 acres when the maximum temperature increases by 1 degree Fahrenheit

with a p-value of 0.068. This puts the relationship just outside of statistical significance for an alpha of five percent.

Page 5: Wildfire

CORRELATION OF WIND SPEED AND TOTAL ACRES BURNED

As expected there is a positive correlation between high wind speeds and wildfire spread. The scatterplots show that we obtain the

highest correlation between maximum daily wind speed and the total area burned on a particular day as opposed to mean wind speed

or max gust speed. The linear regression trend line for acres burned and maximum daily wind speed is statistically significant with a

p-value of 0.0025. The increase in maximum wind speed by one mile per hour increases the area burned by an average of 13.64 acres.

The full trend line equation is [Total Acres Burned] = 13.64 * [Max Wind Speed MPH] – 83.68. We anticipated that the maximum

wind speed would have a higher correlation to wildfire spread than the maximum gust speed because the duration of a gust is usually

less than 20 seconds with the variation in wind speed between the peaks and lulls needing to be at least 9 knots. The maximum wind

speed, however, is not limited in its duration. That means the maximum wind speed is more likely to be reached more consistently,

and for longer periods of time.

CORRELATION OF PRECIPITATION AND TOTAL ACRES BURNED

We predicted that the likelihood of wildfire would be low if there was

no or very little precipitation. By drilling into the data we were able

to determine that all large fires, burning over 20,000 acres of land,

started when the precipitation during that particular day was zero. In

order to get a better sense of how many fires may have started with

precipitation we decided to create grouped bins. We have five bins

with the first one showing no precipitation at all where there were

6754 wildfires. If there was only a trace of precipitation, meaning less

than 0.05 inches of rain, we already see a decrease of over 90% in the

number of wild fires at 636. If there is less than 0.5 but more than

0.05 inches of rain, we see that 1970 fires contributed to the loss of

land during the relevant time period.

Page 6: Wildfire

CORRELATION OF HUMIDITY AND TOTAL ACRES BURNED

While precipitation is defined as the condensation of water vapor in the air in the form of water droplets and ice falling on the ground,

humidity is the amount of water vapor in the air. We anticipated a negative relationship between humidity and the spread of wildfire.

No matter how strong the maximum wind speed or the maximum gust speed is, if the amount of water in the air is significant, the

conditions for fire ignition and growth will become far less likely. If you consider some of the south-east Asian countries, you will

often find a humidity level of 90-100% and very high temperatures, but rarely any large scale fires during the same period. Thus,

when looking at the correlations between mean humidity and total acres burned as well as maximum humidity and total acres burned,

we see a negative relationship that is statistically significant with p-values of <0.0001 in both cases. An increase of one percent in

average daily humidity will decrease the average acres burned by 6.58. The trend line equation is [Total Acres Burned] = -6.58 *

[Mean Humidity %] + 536.

CORRELATION OF POPULATION AND TOTAL ACRES BURNED

One of the more interesting relationships we wanted to examine is would

a higher population generally correlate to bigger wildfire spread or less

spread? We believe that more population means there are more

opportunities to catch a wildfire before it gets too big and more human

risks for not stopping it early. We found there is a negative correlation

between the population of the county and the total acres burned as we

expected with a p-value of 0.00036. According to the trend line equation

we can state that an increase of 10,000 residents will cause the total area

burned to decrease by 8.78 acres with the equation [Total Acres Burned]

= -0.000878 * [Population] + 237.124). It is also possible that counties

with more residents have access to more advanced technology to detect

wildfires earlier. After seeing a strong relationship we decided to include

population in our predictive models that are discussed later.

Page 7: Wildfire

GENERAL CAUSES FOR WILDFIRES

The most impactful cause for wildfires is lightning. Nearly 90 percent of the total acres burned by wildfire were due to a wildfire

caused by lightning. The graph above shows the general causes of wildfire resulting in burned acreage from 2005 through 2014.

Recreationalists and Equipment use follow with 4.65 and 2.20 percent respectively. Interestingly, arson is responsible for only 1.22

percent of all burned acreage.

When scrutinizing the causes of the fires and the total acres burned as well as the respective months, it becomes apparent that wildfires

(predominantly caused by lightning strikes) are particularly dangerous during the summer months of June, July, and August. During

the time period from 2005 to 2015 only three fires burning over 5000 acres started outside of the summer months. In January and

February 2009 as well as January 2010, there were some extremely large burning fires caused by lightning, burning approximately

50,000 acres. The graph below shows the largest fires in the past ten years. Arson, equipment use, recreationists, and miscellaneous

are responsible for 6 major fires (>5000 acres burned), whereas lightning alone was the cause for 19 major fires.

Page 8: Wildfire

FUEL MODEL

Another contributor to the spread of wildfire is the fuel model. The fuel model consists of the specific type of vegetation, density, and

its condition. When evaluating the graph you can see that “open pine, grass under” acts as the relevant fuel model in nearly 41 percent

of all cases when lightning is the cause. Unfortunately we do not have data for the most common fuel model, so we cannot say

whether one fuel model is disproportionately involved in the spread of wildfire although our expectations say it should play a major

factor.

WILDFIRES BY COUNTY

The county map shows that there are four counties that

have lost over 100,000 acres to fires over the past ten

years. The county of Wallowa, located in the far

northeast of Oregon, has to surrendered 277,984 acres to

wildfires. The counties of Grant, Harney, and Douglas

follow with over 100,000 acres each. Based on our

previous findings we would suspect these counties to

have some combination of lower humidity levels, higher

temperatures, higher wind speeds, and lower populations

that correlate to the higher number of acres burned due to

wildfire.

THE DASHBOARD

A Tableau dashboard was constructed that shows the five most important outputs of the analysis. Firstly, we are showing a map of the

relevant area (Oregon in our case) and the regions with the most acres burned. This would allow the authorities to allocate more

resources to those regions knowing the additional risks, thereby reducing the time to start combating wildfires. Second, when

considering the weather forecast, the authorities should also keep an eye on the mean and maximum humidity during the relevant time

period. As mentioned before, the correlation between humidity and total acres burned is very high. Finally, the dashboard shows that

regions in Oregon harboring pines, conifers, and grasses are especially threatened by wildfires once lightning has hit those areas. This

is most likely to happen during the second and third quarter of the year since lightning induced wildfires strike during those time

periods.

Page 9: Wildfire

PREDICTING WILDFIRE STARTS

We would like to predict whether a wildfire is likely to occur using cheaply available weather data. We want the model to generally

favor generating false positives rather than a false negative. In other words, we would rather predict a wildfire and be wrong, than be

unable to predict a wildfire that may likely occur. This would allow for risk assessment for more opportunities. Ideally the model for

classifying whether a wildfire event will occur could be combined later with a model to predict the size and spread of the wildfire to

determine the level of risk and preparedness required.

DATA TRANSFORMATIONS

In order to predict the occurrence of wildfires we must use the combined WeatherWithFires data set so we have access to daily

weather information by area. For more detailed information on the data sets and the transformations required see the Appendix. We

remapped the variable representing total acres burned into a binary value where 1 indicates a wildfire started, and 0 indicates no

wildfire started for each day of weather data available for each location. Because we have a binary value we can use classification

models and algorithms such as logistic regression using selected weather and demographic fields we have available in the data set as

independent predictors.

The original data set WeatherWithFires contains weather data for 36 Oregon counties with data as far back as January 2005 and as

recent as December 2014, which resulted in 128,254 rows of daily weather data. Unfortunately the version of XLMiner in use only

supports 65,000 rows of data. In order to reduce the data the date range was filtered down to February 2010 through December 2014,

which reduced the data to 64,590 rows.

When partitioning the data into a training and validation data set we ran into another limitation of XLMiner. The tool would only

allow a maximum of 10,000 rows to be used for the training data set. Because of this we used the automatic percentages that XLMiner

calculated of 15.4823% of rows for training and 84.5177% for validation. This resulted in the maximum 10,000 rows to be used in the

training set.

MODEL DESCRIPTION

We started the logistic regression model development process by including all possible predictors and then used a combination of

intuition, independent variable p-value strength, avoidance of highly correlated independent variables, and the desire to use the most

easily accessible variables for prediction purposes in building the model. In general for variable types such as temperature, humidity,

wind speed, etc. we must pick either the minimum, mean, or maximum, rather than all three values since they tend to be very highly

correlated to one another. For each variable included in the model we discuss our interpretation of the relationship below:

Data Field Coefficient P-Value Odds Justification MaxTemperatureF 0.03972 1.52e-50 1.0405 There are temperature requirements that must be met for

ignition of a wildfire.

MeanWindSpeedMPH -0.05376 3.48e-05 0.9477 Higher minimum wind speeds seem like they might lower

the likelihood of fires starting even though they would

increase the probability of spreading.

PrecipitationIn -0.2517 0.390 0.7775 More precipitation is assumed to lower the likelihood of fire

starting, independent of lightning probability.

ThunderstormFlag 1.4255 1.14e-18 4.1601 We expect the chance of thunderstorms to significantly

increase the chance of wildfire because such a high

percentage of wildfires are started by lightning.

FogFlag 0.3752 0.0075 1.4553 Fog was a surprise!

Population 1.0042e-06 6.83e-05 1.000001 We included population as we would expect more people

increases the opportunities for human caused wildfires.

Page 10: Wildfire

Area 7.2459e-05 5.64e-06 1.000072 We included area to account for the size of risk areas. We

would expect that larger areas would have a higher risk of

wildfire independent of all other factors.

The only variable in the model that is not statistically significant is the precipitation value. The humidity values were also not

statistically significant when they were included but we kept precipitation because it is easier to get from a weather forecast. Most of

the relationships met expectations with the exception of the presence of fog. According to the model there is a strong relationship of

increased likelihood of a wildfire when there is fog!

The biggest contributor to increasing wildfire start risk appears to be thunderstorms, which makes sense since the most popular cause

of wildfire is lightning strike. The odds of a wildfire go up by 316% with a thunderstorm. Every additional degree Fahrenheit of

maximum daily temperature increase the risk of wildfire by four percent.

MODEL TUNING

The initial modeling attempts used a default cut-off value set to 0.5 and

resulted in overall prediction error of only 6.4% using the validation data.

However the prediction had a 97.8% error rate for predicting a wildfire event

assuming one actually occurred. This means the model would only predict

2.2% of the wildfire events!

In order to make the model useful we need it to be better at predicting when a

wildfire event will occur, even if doing so increases the false positive rate (the

error rate at which the model predicts a wildfire but one does not occur). A

high rate of false alarms could result in wasted resources, but missed chances to be prepared will have the higher cost and risk.

We tried several different values and settled on 0.1 which results in 52.7% chance of not predicting a wildfire that actually occurs. The

cost of being able to correctly predict a wildfire when one is likely to occur is we must accept a high false positive or false alarm rate.

There is an 18.7% chance that when we predict a wildfire start, there will be no wildfire start.

MODEL EVALUATION

Using a cut-off value of 0.1 the model appears to not be overfitting the training data. The training data reported 20.90% error

compared to 20.94% error on the validation data. The breakdown of the error numbers is contained in the appendix for further review.

Page 11: Wildfire

The lift charts show that the model performs particularly well in the top ten percentile of predicted wildfire probabilities. The model is

particularly strong where it predicts with an output of 0.5 or higher. During those first 102 sorted validation observations the model is

at 11.6 lift, meaning it is performing eleven times better than a random model.

ALTERNATIVE METHODS

The classification tree data mining algorithm was attempted against all possible variables. The resultant model was weaker than the

logistic regression model described above and used far fewer variables. The error was at 9.5% overall, with 70.9% chance of not

predicting a wildfire when there is one, and a lower 5.2% chance of predicting a wildfire when there is none. The model results, error

rates, and lift charts are listed in the appendix. The model reverted back to being less successful at prediction when wildfires would

occur but better at not generating false positives.

PREDICTING WILDFIRE IMPACT

Now that we have found the probabilities of wildfire based on the weather data, we turn our attention to the impact of a wildfire in

terms of how far it will spread.

MODEL DEVELOPMENT

In order to predict the area of land burned by the wildfire, we use the FireWithWeather dataset. This dataset is different from the

WeatherWithFire dataset as it only contains weather data corresponding to the counties when a fire was reported.

In order to determine which variable to use, we relied on two factors:

1. Correlation using Excel and visual relationships observed in Tableau.

2. Intuition of theoretical relationship, for example, when there is precipitation the wildfire spread should be less.

Data Field Data Type Expectation

Max Temperature Quantitative High temperatures will allow conditions for continued burning.

Max Humidity Quantitative Higher humidity should decrease the spread of wildfire due to more

moisture in the air.

Mean Humidity Quantitative Higher humidity should decrease the spread of wildfire due to more

moisture in the air.

Max Wind Speed MPH Quantitative Higher wind speeds should increase the chance of wildfire spread.

Precipitation In Quantitative Precipitation should decrease the chance of wildfire spread.

Max Gust Speed MPH Quantitative Higher wind speeds should increase the chance of wildfire spread.

Rain Flag Binary Rain should decrease the chance of wildfire spread.

Thunderstorm Flag Binary Thunderstorms should increase the chance of large wildfires because

multiple smaller fires may be more likely to join.

Fog Flag Binary Fog should decrease the chance of wildfire spread.

Snow Flag Binary Snow should decrease the chance of wildfire spread.

Hail Flag Binary Hail should decrease the chance of wildfire spread.

Population Quantitative Higher population should decrease the opportunity for wildfires to

grow due to the impact and likelihood of getting noticed.

Fuel Model Categorical (Dummy

Variable Used)

There will likely be some fuel models that have a higher correlation to

wildfire spread based on flammability.

The weather data gathered had missing values in some records. These records were not significant in number compared to the total

number of records (over 10,000). Thus, in order to prevent any distortion of results, we chose to delete the records with missing values

Page 12: Wildfire

Training Data Scoring - Summary Report

Total sum

of

squared

errors RMS Error

Average

Error

3504824 25.1546021 -6.56793E-16

Validation Data Scoring - Summary Report

Total sum

of

squared

errors RMS Error

Average

Error

2541204 26.23547353 0.011335704

of relevant variables. As a result we came down to having 9231 records. This is substantial number of records for the 13 variables (26

including all the dummy variables).

The data was the partitioned after selecting just the relevant variables and creating dummies. The partition was made into 60% training

set and 40% validation set. We decided to use the multiple regression model to find the spread of fire based on the weather data

available.

The initial model attempt included all of the variables above including those that were highly correlated with one another such as

maximum daily humidity and mean daily humidity. The initial model results are listed in the appendix. Not only were some variables

highly correlated, but when we visually looked at the total acres burned there were some very large outliers for the dependent variable.

We were concerned that these very large values were skewing the model significantly since it was trying to minimize the sum of

squared errors, which is significantly influenced by outliers. By removing 94 outliers (possible errors) we were able to improve the R

squared from 0.007 to 0.02 and improved several of the predictor p-values.

After removing the highly correlated mean humidity and gust speed variables, the final model appeared:

Even though our errors have gone up marginally, we now have 10 independent variables which all have p-values less than 0.2.

MODEL EVALUATION

The sum of squared errors is better for the validation data than for the training data, and the RMS error is in the same ballpark. From

the regression measures it appears the model is not overfitting the training data. The lift chart shows that the model is performing

better than an average model. (Other error metrics are presented in the appendix)

Input

VariablesCoefficient P-Value

Residual DF 5515

Intercept 14.26904 0.000474 R² 0.024966

MaxTemperatureF 0.023542 0.364864 Adjusted R² 0.020899

MaxHumidity -0.18363 1.47E-09 Std. Error Estimate25.20928

MaxWind SpeedMPH 0.141422 0.022753 RSS 3504824

PrecipitationIn -0.7239 0.695158

RainFlag 0.878086 0.340045

ThunderstormFlag -1.67444 0.229963

FogFlag 2.855736 0.016898

SnowFlag 1.853758 0.441588

HailFlag -2.938 0.86946

Population -1.2E-05 0.000648

Fuel Model_A 2.758355 0.08322

Fuel Model_B 2.884547 0.504479

Fuel Model_C 2.612847 0.106094

Fuel Model_F 2.466192 0.186664

Fuel Model_G -0.5031 0.818683

Fuel Model_H 1.782308 0.274225

Fuel Model_I 1.6966 0.498202

Fuel Model_J 8.038211 3.73E-05

Fuel Model_K 0.262648 0.910736

Fuel Model_L 2.369435 0.156805

Fuel Model_R -1.10268 0.660198

Fuel Model_T 16.25123 2.57E-07

Fuel Model_U 1.221496 0.734262

Page 13: Wildfire

The ROC graph has very small area over the curve, and hence shows our model’s effectiveness.

CONCLUSIONS

Nearly 80% of the total acres burned from Oregon wildfires have occurred between June and August from 2005 to 2014.

40% of the overall acres burned were from fires started in July.

Lightning is the cause for almost 90% of the total acres burned from Oregon wildfires.

The chance of wildfire increases with higher temperature, lower wind speeds, thunderstorms, fog, and in larger areas with

higher population.

The impact of a wildfire increases with lower humidity, higher wind speeds, fog, and in areas with lower population. Several

fuel models have a statistically significant impact as well.

Predicting the occurrence and impact of wildfire is difficult. Even with the right conditions present where the models indicate

very high probability we see cases where no wildfire occurs. Because of this we have adjusted our classification model to be

more likely to give false positives rather than be unable to predict a wildfire when does actually occur.

14

15

16

17

18

19

20

21

22

23

24

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

0 1000 2000 3000 4000

Cu

mu

lati

ve

# Cases

Lift chart (validation dataset)

Cumulative TotalAcres when sorted

using predicted values

Cumulative Total

Acres using average

0

0.5

1

1.5

2

2.5

3

3.5

1 2 3 4 5 6 7 8 9 10

De

cile

me

an /

Glo

bal

me

an

Deciles

Decile-wise lift chart (validation dataset)

Series1

Page 14: Wildfire

APPENDIX

DATA PREPARATION

Data Source - Wildfires

Wildfires List from Oregon Department of Forestry,

http://www.odf.state.or.us/DIVISIONS/protection/fire_protection/fires/FIRESlist.asp

Acquisition of Data

In order to collect the full Oregon data set the team had to use a web to specify a Region as well as a year range from 1960 to 2015.

The web form would only return a maximum of 2500 rows, so we could not simply specify All Areas as the region and then the full

time range from 1960 to 2015 as the number of actual rows appeared to exceed 2500.

Instead we selected All Areas and then manually changed the yearly range in increments of 2-5 years depending on how much data

was returned, ensuring each time that less than 2500 rows were present, otherwise we broke down the yearly range into smaller

buckets to ensure that we captured all of the data. We combined the results into a single Excel spreadsheet called FireData.xlsx.

Data Definition

Data Field Description

Fire Year Year that wildfire is associated to

District ODF district associated to start of wildfire

Unit ODF unit associated to start of wildfire

Fire Number ODF fire identifier assigned to wildfire event

Fire Name Label applied to wildfire event

Legal Code used for location

Latitude Geographic latitude in degrees, minutes, and seconds of wildfire start

Longitude Geographic longitude in degrees, minutes, and seconds of wildfire start

Fuel Model Coded type of vegetation fuel for wildfire:

A=Annual grasses (cheat)

B=Dense Chaparral

C=Open pine, grass under

F=Dense Brush (lighter than B)

G=Conifer, Old growth

H=Conifer, Second growth

I=Slash, heavy

J=Slash, medium

K=Slash, thinning, P.C, Scattrd

L=Grass Perennial

R=Hardwood, summer

T=Sagebrush, medium dense

U=Closed canopy pine

X=Non wildland fuel

County Oregon county where wildfire event started

Report Date Date and time that wildfire event was reported

General Cause Assigned causal code for wildfire event:

Arson

Debris Burning

Equipment Use

Juveniles

Lightning

Miscellaneous

Railroad

Page 15: Wildfire

Recreationist

Smoking

Under Invest

ODF Acres Quantity of acres within ODF jurisdiction affected by wildfire.

Total Acres Quantity of acres affected by wildfire

Data Transformation

Not applicable yet.

Data Source - Weather

Daily Weather Conditions, http://www.wunderground.com

Acquisition of Data

The Weather Underground website provides the capability to download a comma-delimited file (CSV) listing several historical

weather factors for a given airport and date range of up to one year. The team wanted to select an airport near a city where the weather

conditions could be used to represent the county. It was a manual and time-consuming task to look up 29 airports (representing 36

cities which represented 36 counties) and for each airport to consider looking up and appending daily weather data for 2005 through

2014, one year at a time. Instead a PowerShell script was developed (GetWeatherData.ps1) that would iterate over the required

airports and the desired years of data to collect the daily weather data into a single file from Weather Underground.

Data Definition

Data Field Description

WeatherDate The date the historical weather conditions

MaxTemperatureF Maximum daily temperature in degrees Fahrenheit

MeanTemperatureF Average daily temperature in degrees Fahrenheit

MinTemperatureF Minimum daily temperature in degrees Fahrenheit

MaxDewPointF Maximum daily dew point in degrees Fahrenheit

MeanDewPointF Average daily dew point in degrees Fahrenheit

MinDewPointF Minimum daily dew point in degrees Fahrenheit

MaxHumidity Maximum daily humidity percentage

MeanHumidity Average daily humidity percentage

MinHumidity Minimum daily humidity percentage

MaxSeaLevelPressureIn Maximum daily barometric pressure in inches of mercury

MeanSeaLevelPressureIn Average daily barometric pressure in inches of mercury

MinSeaLevelPressureIn Minimum daily barometric pressure in inches of mercury

MaxVisibilityMiles Maximum daily visibility in miles

MeanVisibilityMiles Average daily visibility in miles

MinVisibilityMiles Minimum daily visibility in miles

MaxWindSpeedMPH Maximum daily wind speed in miles per hour

MeanWindSpeedMPH Average daily wind speed in miles per hour

MinWindSpeedMPH Minimum daily wind speed in miles per hour

MaxGustSpeedMPH Maximum daily gust speed in miles per hour

PrecipitationIn Accumulation of precipitation for the day in inches.

T= trace of precipitation, <0.01 inches

CloudCover Daily measure of cloud cover in oktas

Events Dash-delimited list of weather events. The possibilities are:

Fog

Hail

Rain

Snow

Thunderstorm

Page 16: Wildfire

Tornado

WindDirDegress Wind direction

Airport Airport code for where weather conditions were observed

Data Characteristics

Several airports selected did not have data from 2005 to 2014. In these cases rows were not available and were not synthesized for

those locations for those particular days.

Data Transformation

The non-numeric values that appeared in PrecipitationIn as ‘T’s were converted to 0.005 to estimate a value less than 0.01 inches of

precipitation.

Columns were created called FogFlag, HailFlag, RainFlag, SnowFlag, ThunderstormFlag, and TornadoFlag to enumerate these true of

false values from the Events field. If the Events field contained the substring “Fog” then FogFlag would be set to 1, otherwise it would

be set to 0. The same is true for Hail, Rain, Snow, Thunderstorm, and Tornado substrings for the associated flag fields.

268 rows contained blank temperature values. Most of these records were also missing other data fields such as wind speed, dew

point, pressure, and more. Any rows that were missing data were removed. We also looked at the list of values in each field and

removed any nonsensical values. An example was CloudCover should range from 0 to 8 to be valid oktas, but one row contained a

value of -902.

Data Source – Location Demographics

Oregon Counties

Acquisition of Data

Because there are only thirty-six counties in Oregon a spreadsheet was generated manually to map each Oregon County to a county

seat and to an airport where daily historical weather data could be pulled. This table is used to connect the wildfire data that is

organized by county to the daily weather data that is tied to airport. The table also contains some descriptive information such as

population and area for each county.

The airport codes were determined by looking up the county seat in http://www.wunderground.com and navigating to the historical

weather data feature to find out where that data is pulled from.

Data Definition

Data Field Description

County Oregon County name

City City name of the county seat

Population Population in quantity of people

Area Size of county in square miles

Airport Airport code for weather station near city observing historical weather data

Flattening the Data

Page 17: Wildfire

In order to combine the data into a single table for the purpose of predictive analytics a Microsoft Access file was created with links to

the three main data sources, FireData.xlsx, WeatherByAirport.xlsx, and OregonCounties.xlsx. Queries were generated with Microsoft

Access and used to produce two new flattened files combining the data:

FiresByWeather.xlsx – This file takes each wildfire event and brings in the weather for the particular day and location. The data will

be useful for predicting the number of acres burned as it contains a row for each wildfire. It does not contain data pertaining to days

where no wildfires occurred.

WeatherWithFires.xlsx – This file starts with each day that we have weather data available for and then joins in the cases where

wildfires were started. So this data contains many rows where weather data was recorded for a particular location and a particular day

where no wildfire data is present. This data can be used to classify the likelihood of wildfire events by using the daily weather data.

Page 18: Wildfire

CORRELATION MATRIX

LOGISTIC REGRESSION

Wea

ther

Yea

r

Ma

xTem

per

atu

reF

Mea

nTe

mp

era

ture

F

Min

Tem

per

atu

reF

Ma

xDew

Po

intF

Mea

nD

ewP

oin

tF

Min

Dew

po

intF

Ma

xHu

mid

ity

Mea

nH

um

idit

y

Min

Hu

mid

ity

Ma

xSea

Leve

lPre

ssu

reIn

Mea

nSe

aLe

velP

ress

ure

In

Min

Sea

Leve

lPre

ssu

reIn

Ma

xVis

ibili

tyM

iles

Mea

nV

isib

ility

Mile

s

Min

Vis

ibili

tyM

iles

Ma

xWin

dSp

eed

MP

H

Mea

nW

ind

Spee

dM

PH

Ma

xGu

stSp

eed

MP

H

Pre

cip

ita

tio

nIn

Clo

ud

Co

ver

Win

dD

irD

egre

es

Ma

xGu

stSp

eed

MP

HA

dj

Ra

inFl

ag

Thu

nd

erst

orm

Fla

g

Fog

Fla

g

Sno

wFl

ag

Ha

ilFla

g

Torn

ad

oFl

ag

Po

pu

lati

on

Are

a

Fire

Fla

g

Tota

l Acr

es S

cru

bb

ed

WeatherYear 1.000

MaxTemperatureF 0.043 1.000

MeanTemperatureF 0.036 0.955 1.000

MinTemperatureF 0.020 0.754 0.913 1.000

MaxDewPointF 0.018 0.659 0.792 0.856 1.000

MeanDewPointF 0.005 0.602 0.762 0.867 0.974 1.000

MinDewpointF -0.010 0.512 0.689 0.829 0.911 0.967 1.000

MaxHumidity -0.030 -0.434 -0.385 -0.266 0.131 0.169 0.193 1.000

MeanHumidity -0.028 -0.634 -0.497 -0.238 0.078 0.151 0.218 0.830 1.000

MinHumidity -0.031 -0.665 -0.489 -0.178 0.032 0.117 0.205 0.641 0.926 1.000

MaxSeaLevelPressureIn 0.085 -0.313 -0.377 -0.410 -0.353 -0.348 -0.320 0.121 0.122 0.102 1.000

MeanSeaLevelPressureIn 0.086 -0.243 -0.301 -0.338 -0.300 -0.287 -0.256 0.092 0.090 0.071 0.962 1.000

MinSeaLevelPressureIn 0.087 -0.178 -0.227 -0.261 -0.238 -0.220 -0.192 0.069 0.065 0.048 0.879 0.966 1.000

MaxVisibilityMiles -0.007 0.149 0.123 0.070 0.057 0.037 0.016 -0.100 -0.176 -0.210 -0.078 -0.075 -0.071 1.000

MeanVisibilityMiles -0.009 0.374 0.306 0.168 0.017 -0.008 -0.030 -0.410 -0.572 -0.574 -0.104 -0.072 -0.049 0.436 1.000

MinVisibilityMiles 0.001 0.405 0.326 0.171 -0.034 -0.047 -0.058 -0.494 -0.631 -0.599 -0.038 0.008 0.035 0.183 0.750 1.000

MaxWindSpeedMPH -0.002 0.075 0.105 0.127 0.038 0.013 -0.012 -0.209 -0.199 -0.145 -0.289 -0.355 -0.383 0.097 0.175 0.102 1.000

MeanWindSpeedMPH 0.004 0.007 0.076 0.161 -0.010 -0.010 -0.008 -0.291 -0.191 -0.080 -0.226 -0.265 -0.282 0.055 0.161 0.150 0.765 1.000

MaxGustSpeedMPH -0.032 -0.002 0.047 0.109 0.025 0.006 -0.010 -0.177 -0.104 -0.048 -0.271 -0.337 -0.366 0.062 0.098 0.029 0.876 0.677 1.000

PrecipitationIn -0.016 -0.170 -0.093 0.024 0.095 0.107 0.109 0.194 0.302 0.331 -0.175 -0.235 -0.262 -0.102 -0.289 -0.304 0.169 0.161 0.206 1.000

CloudCover -0.052 -0.554 -0.372 -0.070 0.038 0.105 0.167 0.463 0.704 0.761 -0.068 -0.111 -0.130 -0.130 -0.427 -0.506 0.016 0.027 0.062 0.349 1.000

WindDirDegrees 0.007 0.211 0.205 0.165 0.118 0.104 0.082 -0.164 -0.195 -0.192 -0.103 -0.089 -0.066 0.046 0.124 0.143 0.136 0.086 0.059 -0.072 -0.117 1.000

MaxGustSpeedMPHAdj -0.019 0.080 0.116 0.145 0.061 0.040 0.017 -0.185 -0.176 -0.121 -0.282 -0.344 -0.371 0.082 0.137 0.080 0.906 0.709 0.998 0.187 0.029 0.124 1.000

RainFlag -0.035 -0.233 -0.120 0.049 0.157 0.166 0.172 0.309 0.398 0.411 -0.182 -0.241 -0.267 -0.043 -0.231 -0.343 0.162 0.071 0.172 0.351 0.518 -0.063 0.173 1.000

ThunderstormFlag 0.014 0.133 0.143 0.134 0.123 0.102 0.077 -0.063 -0.080 -0.078 -0.111 -0.109 -0.096 0.014 0.045 -0.007 0.121 0.024 0.105 0.053 -0.033 0.035 0.116 0.122 1.000

FogFlag 0.017 -0.223 -0.209 -0.158 -0.023 -0.018 -0.009 0.313 0.368 0.323 0.110 0.097 0.080 -0.201 -0.686 -0.555 -0.175 -0.179 -0.156 0.060 0.199 -0.079 -0.141 0.083 -0.041 1.000

SnowFlag -0.027 -0.297 -0.305 -0.270 -0.248 -0.240 -0.230 0.119 0.160 0.164 -0.028 -0.082 -0.109 -0.067 -0.248 -0.253 0.040 0.033 0.045 0.093 0.184 -0.033 0.037 0.073 -0.018 0.176 1.000

HailFlag -0.005 -0.005 -0.004 -0.004 -0.005 -0.004 -0.003 0.005 0.002 -0.002 -0.001 -0.001 0.000 0.001 0.001 -0.007 0.007 0.003 0.004 0.002 0.005 0.003 0.005 0.009 0.032 -0.003 0.007 1.000

TornadoFlag -0.002 -0.001 0.001 0.003 0.003 0.003 0.003 -0.001 0.001 0.003 -0.001 -0.001 0.000 0.001 0.002 -0.003 -0.001 0.000 -0.003 0.001 0.006 0.000 -0.001 0.008 -0.001 -0.002 -0.001 0.000 1.000

Population -0.007 0.024 0.079 0.133 0.190 0.211 0.225 0.181 0.149 0.142 0.019 0.024 0.027 -0.004 -0.039 -0.049 -0.082 -0.081 -0.188 0.054 0.180 -0.066 -0.062 0.128 -0.028 0.050 -0.043 0.001 0.015 1.000

Area 0.007 0.017 -0.090 -0.231 -0.280 -0.324 -0.356 -0.187 -0.268 -0.281 0.010 -0.005 -0.016 0.026 0.110 0.109 0.077 0.003 -0.032 -0.098 -0.174 0.042 0.044 -0.116 0.046 -0.049 0.078 0.001 0.000 -0.229 1.000

FireFlag 0.054 0.172 0.167 0.135 0.114 0.100 0.075 -0.070 -0.108 -0.116 -0.050 -0.034 -0.018 0.023 0.032 0.038 0.000 -0.035 -0.030 -0.041 -0.097 0.016 0.004 -0.028 0.146 -0.002 -0.026 -0.002 -0.001 0.022 0.049 1.000

Total Acres Scrubbed 0.011 0.015 0.015 0.013 0.004 0.002 0.000 -0.018 -0.018 -0.014 -0.005 -0.004 -0.003 0.001 0.007 0.008 0.013 0.004 0.010 -0.004 -0.009 0.005 0.011 -0.002 0.022 0.000 -0.003 0.000 0.000 -0.005 0.007 0.059 1.000

Regression Model

Input

VariablesCoefficient Std. Error Chi2-Statistic P-Value Odds CI Lower CI Upper

Residual DF 9992

Intercept -5.38627515 0.222653298 585.2195221 2.7462E-129 0.004579 0.00296 0.007084 Residual Dev.4602.507

MaxTemperatureF 0.039721854 0.002656669 223.55486 1.51707E-50 1.040521 1.035117 1.045953 # Iterations Used 4

MeanWindSpeedMPH -0.05375577 0.012985249 17.13758304 3.47675E-05 0.947664 0.923849 0.972092 Multiple R² 0.088048

PrecipitationIn -0.25171563 0.29262455 0.739944018 0.389678812 0.777466 0.438126 1.379634

ThunderstormFlag 1.425532448 0.161620716 77.79651824 1.14221E-18 4.160072 3.030603 5.710481

FogFlag 0.375224145 0.140300518 7.152584657 0.007485603 1.455318 1.105436 1.91594

Population 1.10042E-06 2.76332E-07 15.85828215 6.82671E-05 1.000001 1.000001 1.000002

Area 7.24593E-05 1.59627E-05 20.60522102 5.6442E-06 1.000072 1.000041 1.000104

Training Data Scoring - Summary Report

0.1

Actual Class 1 0

1 342 353

0 1737 7568

Class # Cases # Errors % Error

1 695 353 50.79136691

0 9305 1737 18.66738313

Overall 10000 2090 20.9

1

0.164502165

0.492086331

0.813326169

0.246575342

Specificity

F1-Score

Predicted Class

Error Report

PerformanceSuccess Class

Precision

Recall (Sensitivity)

Cutoff probability value for success (UPDATABLE) Updating the value here will NOT update value in detailed report

Confusion Matrix

Page 19: Wildfire

Validation Data Scoring - Summary Report

0.1

Actual Class 1 0

1 1678 1870

0 9562 41480

Class # Cases # Errors % Error

1 3548 1870 52.70574972

0 51042 9562 18.73359194

Overall 54590 11432 20.94156439

1

0.149288256

0.472942503

0.812664081

0.226940763F1-Score

Error Report

PerformanceSuccess Class

Precision

Recall (Sensitivity)

Specificity

Cutoff probability value for success (UPDATABLE) Updating the value here will NOT update value in detailed report

Confusion MatrixPredicted Class

Page 20: Wildfire

CLASSIFICATION TREE

66.5

MeanTemper

0

276

2583

Area

513

0

291

8.547e+

Population

221

0

133

73.5

MeanTemper

882

0.08

PrecipitationI

522

1

360

0

494

1

28

Test Data scoring - Summary Report (Using Best Pruned Tree)

0.1

Actual Class 1 0

1 412 1006 0.70945

0 1069 19349 0.72181

Class # Cases # Errors % Error

1 1418 1006 70.94499

0 20418 1069 5.235576

Overall 21836 2075 9.502656

1

0.27819

0.29055

0.947644

0.284236

Specificity

F1-Score

Predicted Class

Error Report

PerformanceSuccess Class

Precision

Recall (Sensitivity)

Cutoff probability value for success (UPDATABLE) Updating the value here will NOT update value in detailed report

Confusion Matrix

Page 21: Wildfire
Page 22: Wildfire

MULTIPLE LINEAR REGRESSION

Initial model

Removed outliers

Input

VariablesCoefficient Std. Error t-Statistic P-Value CI Lower CI Upper

RSS

Reduction Residual DF 5568

Intercept 1236.561 544.8552495 2.269522234 0.023275 168.4322 2304.69 158090254 R² 0.007639959

MaxTemperatureF-4.572656 4.012047683 -1.13973123 0.254447 -12.4378 3.292523 41253014.86 Adjusted R² 0.003184319

MaxHumidity -3.17814 5.611288029 -0.56638335 0.571156 -14.1785 7.822174 109461933.3 Std. Error Estimate 3094.180509

MeanHumidity -9.098839 6.313910559 -1.44107817 0.149619 -21.4766 3.278889 30408733.58 RSS 53307770441

MaxWind SpeedMPH3.541634 15.97168029 0.221744611 0.824521 -27.7691 34.85236 10166551.77

PrecipitationIn 1.379731 49.20377299 0.02804116 0.97763 -95.0789 97.83832 11900.62063

MaxGustSpeedMPHAdj2.325006 10.65970782 0.218111636 0.82735 -18.5722 23.22219 636555.642

RainFlag -137.1917 113.0599868 -1.21344199 0.225012 -358.833 84.44995 3138592.002

ThunderstormFlag223.0495 166.0781577 1.343039106 0.179314 -102.529 548.6274 27155645.91

FogFlag 107.1989 149.7348341 0.715924808 0.474068 -186.34 400.7376 4858758.305

SnowFlag -53.17142 285.3094312 -0.18636404 0.852166 -612.489 506.1464 169706.098

HailFlag -51.59413 2191.314296 -0.02354483 0.981217 -4347.43 4244.237 25261.82324

Population -0.00054 0.000421497 -1.2805477 0.200406 -0.00137 0.000287 27480191.06

Fuel Model_A -23.20317 196.499996 -0.11808232 0.906007 -408.42 362.0135 7860546.175

Fuel Model_B -93.17159 524.7264323 -0.17756221 0.859073 -1121.84 935.4969 1062989.165

Fuel Model_C 138.5869 199.6299384 0.694218891 0.487574 -252.766 529.9394 4871955.574

Fuel Model_F 1.334528 232.2391453 0.005746354 0.995415 -453.945 456.6139 1121043.416

Fuel Model_G 786.8749 264.7020475 2.972681591 0.002965 267.9556 1305.794 131346533.8

Fuel Model_H -17.82082 199.1608045 -0.08947956 0.928704 -408.254 372.6121 394035.6074

Fuel Model_I -28.55546 311.929409 -0.09154464 0.927063 -640.059 582.9479 209555.0402

Fuel Model_J -24.4791 240.3829083 -0.10183378 0.918892 -495.723 446.7652 519846.5116

Fuel Model_K -80.02531 293.8633776 -0.27232147 0.785385 -656.112 496.0616 1831746.697

Fuel Model_L 94.73391 205.0273256 0.462055021 0.64406 -307.2 496.6675 5399746.683

Fuel Model_R -54.4641 305.9224255 -0.17803237 0.858704 -654.191 545.2632 74611.62428

Fuel Model_T -110.6278 374.9614261 -0.29503786 0.767976 -845.698 624.4429 697467.9889

Fuel Model_U -71.17254 442.4721067 -0.16085204 0.872216 -938.59 796.2454 247710.4992

Training Data Scoring - Summary Report

Total sum of

squared

errors RMS Error

Average

Error

53307770441 3086.982 2.01864E-12

Validation Data Scoring - Summary Report

Total sum of

squared

errors RMS Error

Average

Error

10292488889 1661.138 -77.8164336

Page 23: Wildfire

Error Metrics:

CART (Prediction):

Input

VariablesCoefficient Std. Error t-Statistic P-Value CI Lower CI Upper

RSS

Reduction Residual DF 5513

Intercept 17.69732773 4.574001388 3.86911289 0.000110492 8.730481103 26.66417436 99409.75 R² 0.02351

MaxTemperatureF-0.033869112 0.033838333 -1.0009096 0.316914445 -0.10020559 0.032467365 1717.131 Adjusted R² 0.019082

MaxHumidity -0.163498766 0.048450746 -3.37453559 0.000744507 -0.258481336 -0.0685162 29929.83 Std. Error Estimate 26.07203

MeanHumidity -0.027774402 0.053782482 -0.51642099 0.605581121 -0.133209277 0.077660472 399.223 RSS 3747467

MaxWind SpeedMPH0.017029282 0.137809494 0.12357118 0.901659317 -0.253131677 0.28719024 6408.125

PrecipitationIn -0.514039049 1.291650056 -0.39797083 0.690667095 -3.046182563 2.018104464 171.5546

MaxGustSpeedMPHAdj0.117996193 0.093915521 1.2564078 0.20902139 -0.066115267 0.302107654 1252.731

RainFlag -0.006771199 0.961029838 -0.00704577 0.994378587 -1.890768694 1.877226297 171.5756

ThunderstormFlag -0.504207046 1.449487848 -0.34785186 0.727964719 -3.34577488 2.337360788 39.1471

FogFlag 2.451828004 1.287452173 1.904403173 0.056909697 -0.072086004 4.975742012 2831.44

SnowFlag 0.82393785 2.480077405 0.332222635 0.739733784 -4.037991963 5.685867664 239.324

HailFlag -1.093650238 18.46491974 -0.05922854 0.952772225 -37.29217517 35.1048747 13.69563

Population -9.42886E-06 3.47509E-06 -2.71326713 0.006683039 -1.62414E-05 -2.6163E-06 6331.904

Fuel Model_A 2.806015584 1.638480411 1.712571945 0.08684755 -0.406052209 6.018083377 0.505967

Fuel Model_B 0.419331057 4.461402773 0.093990854 0.925119838 -8.326777876 9.16543999 219.4555

Fuel Model_C 2.278022003 1.668843985 1.365029939 0.172299342 -0.99357037 5.549614377 703.4535

Fuel Model_F 2.917559919 1.95890901 1.489380009 0.136444532 -0.922674301 6.757794138 5.699657

Fuel Model_G 2.192067912 2.261219665 0.969418383 0.332379011 -2.240814417 6.624950241 169.9185

Fuel Model_H 1.916788103 1.666377499 1.150272434 0.250081573 -1.349968986 5.183545191 1689.793

Fuel Model_I 3.835796354 2.545833715 1.50669556 0.131946005 -1.155041759 8.826634467 8.491603

Fuel Model_J 9.170368758 2.003729171 4.576650823 4.82848E-06 5.242269348 13.09846817 12563.57

Fuel Model_K -0.021562087 2.398802742 -0.00898869 0.992828487 -4.724161506 4.681037332 1149.861

Fuel Model_L 2.953693022 1.715946481 1.721320014 0.085248871 -0.41023882 6.317624865 147.0082

Fuel Model_R -0.321406314 2.5067467 -0.12821651 0.897982311 -5.235618462 4.592805835 1528.575

Fuel Model_T 18.8621992 3.292289681 5.729203997 1.06246E-08 12.408013 25.31638539 22411.38

Fuel Model_U 1.587415804 3.742752857 0.424130544 0.67148716 -5.749855871 8.924687478 122.2782

Training Data Scoring - Summary Report

Total sum of

squared errors RMS Error

Average

Error

3747466.754 26.01077017 -5.38981E-14

Validation Data Scoring - Summary Report

Total sum of

squared errors RMS Error

Average

Error

2300878.365 24.96410405 -0.272124133

Average Error 0.011336

MAD 7.235846

MAPE 161.4117

MSE 688.3001

RMSE 26.23547

Page 24: Wildfire

Test Data scoring - Summary Report (Using Best Pruned Tree)

Total sum

of

squared

errors RMS Error

Average

Error

844872.5 21.3934 -0.7446691

Page 25: Wildfire

SOURCES

DATA SOURCES

1.) Daily data on wildfires in the 14 regions of Oregon (allows us to pick data from 1960-2015). It shows the fuel model of the

fire as well as the total acres burned. http://www.odf.state.or.us/DIVISIONS/protection/fire_protection/fires/FIRESlist.asp

2.) Monthly data on average degrees in Fahrenheit as well as monthly data on average precipitation in inches

http://www.usclimatedata.com/climate/tillamook/oregon/united-states/usor0347

3.) Monthly wind speed data for each district (this data set also provides monthly data on humidity, precipitation, and

temperature)

http://www.wunderground.com/history/airport/KTMK/2006/10/11/CustomHistory.html?dayend=11&monthend=10&yearend

=2015&req_city=&req_state=&req_statename=&reqdb.zip=&reqdb.magic=&reqdb.wmo=

4.) Fuel Moisture database (it provides bi-monthly data, unfortunately not for all districts, additional research will be necessary)

http://wfas.net/index.php/national-fuel-moisture-database-moisture-drought-103

REFERENCES

a) http://www.wno.org/forestfire

b) https://www.nwf.org/Wildlife/Threats-to-Wildlife/Global-Warming/Global-Warming-is-Causing-Extreme-

Weather/Wildfires.aspx

c) B. Arrue, A. Ollero, and J. Matinez de Dios. An Intelligent System for False Alarm Reduction in Infrared Forest-Fire

Detection. IEEE Intelligent Systems, 15(3):64–73, 2000.

d) J. Terradas J. Pinol and F. Lloret. Climate warming, wildfire hazard, and wildfire occurrence in coastal eastern Spain.

Climatic Change, 38:345–357, 1998.

e) http://www.kgw.com/story/news/local/2015/06/15/oregon-and-washington-wildfire-updates/71264920/