Upload
sajid-saleem
View
141
Download
0
Embed Size (px)
Citation preview
UNDERSTANDING AND PREDICTING WILDFIRES IN OREGON
Sajid Saleem, Sean Polling and Robert Krebs
BUS519 Business Analytics
CONTENTS Executive Summary ................................................................................................................................................................................... 3
Problem Description .................................................................................................................................................................................. 3
Data and Model Assumptions ................................................................................................................................................................ 3
Review of Prior Work ................................................................................................................................................................................ 3
Understanding Wildfires ............................................................................................................................................................................ 4
Correlation of Temperature and Total Acres Burned ............................................................................................................................. 4
Correlation of Wind Speed and Total Acres Burned ............................................................................................................................. 5
Correlation of Precipitation and Total Acres Burned ............................................................................................................................. 5
Correlation of Humidity and Total Acres Burned .................................................................................................................................. 6
Correlation of Population and Total Acres Burned ................................................................................................................................ 6
General Causes for Wildfires ................................................................................................................................................................. 7
Fuel Model ............................................................................................................................................................................................. 8
Wildfires by County ............................................................................................................................................................................... 8
The Dashboard ....................................................................................................................................................................................... 8
Predicting Wildfire Starts .......................................................................................................................................................................... 9
Data Transformations ............................................................................................................................................................................. 9
Model Description ................................................................................................................................................................................. 9
Model Tuning ...................................................................................................................................................................................... 10
Model Evaluation ................................................................................................................................................................................. 10
Alternative Methods ............................................................................................................................................................................. 11
Predicting Wildfire Impact ...................................................................................................................................................................... 11
Model Development ............................................................................................................................................................................. 11
Model Evaluation ................................................................................................................................................................................. 12
Conclusions .............................................................................................................................................................................................. 13
Appendix .................................................................................................................................................................................................. 14
Data Preparation................................................................................................................................................................................... 14
Correlation Matrix ............................................................................................................................................................................... 18
Logistic Regression .............................................................................................................................................................................. 18
Classification Tree ............................................................................................................................................................................... 20
Multiple Linear Regression .................................................................................................................................................................. 22
Sources ..................................................................................................................................................................................................... 25
Data Sources ........................................................................................................................................................................................ 25
References ............................................................................................................................................................................................ 25
EXECUTIVE SUMMARY
Two hundred million dollars were spent per week during the summer months of 2015 to battle wildfires across the country. In Oregon
and Washington alone three firefighters lost their lives while many homes were destroyed. The most effective methods of detecting
wildfires can be expensive and are not accessible to rural areas where wildfires are more common. This paper uses descriptive
analytics to better understand wildfire behavior in Oregon and investigates the use of predictive analytics paired with inexpensive and
accessible data sources to create models to predict both wildfire ignition and spread.
In this analysis we determine that most wildfires occur in the summer months of Oregon and the most common cause is lightning. We
also demonstrate that higher temperatures, lower wind speeds, thunderstorms, and fog are predictors for wildfire ignition. Wildfire
spread is most impacted by lower humidity, higher wind speeds, fog, and certain fuel models (vegetation type).
PROBLEM DESCRIPTION
Wildfires, also known as forest fires, can move in excess of 15 miles per hour and destroy several acres of land in just a few minutes.
In the US there are about 100,000 fires each year, burning up to five million acres of land. With fires burning hotter and people
settling closer to fire prone areas each year, wildfire control continues to be an important issue.
Traditional methods of surveillance are not always effective at detecting fires early enough. Automated solutions, such as satellite-
based surveillance, infrared scanners and local sensors, can help improve early fire detection but suffer from two significant flaws,
high cost and not being predictive.
Using data mining tools can help. Weather data such as air humidity, wind speed, and temperature are relatively easy to collect in real
time and at low cost. These conditions are known to impact the occurrence of wildfires. The intent of this project is to look at the
observed behavior of wildfires in Oregon. We will build predictive to classify and predict the occurrence and impact of wildfires in
Oregon.
DATA AND MODEL ASSUMPTIONS
Use data that is readily available or cheap to collect. The model should rely on data that is readily available at low cost.
The wildfire data used is only land under the jurisdiction of the Oregon Department of Forestry (ODF). This includes 16
million acres of land in Oregon.
Restrict the data and analysis to the State of Oregon unless more data is required to meet data mining algorithm requirements
and best practices.
REVIEW OF PRIOR WORK
Predicting the outbreak and behavior of a wildfire has always been a challenging task for firefighters since many of the independent
variables in the modern, sophisticated prediction models are forecasts themselves. The weather models, the fire behavior models and
the interaction of both are often based on forecasts. Nevertheless it is vital to improve the existing models since wildfires are an
ongoing threat to human lives and property. An article on www.scientificamerican.com (Massay, Nathanel; November 15, 2013) talks
about the development of a new technique to predict wildfires, using high-resolution satellite imagery to periodically check and revise
computer simulations. The Suomi NPP, a weather satellite, uses a special infrared sensor capable of taking high resolution images of
the Earth’s surface. The satellite moves over the same spot on the Earth every 12 hours, feeding the data into a model that has been
developed to identify signature patterns for wildfires.
Paul Cortez and Anibal Morales of the Department of Information Systems/R&D Algoritmi Centre at the University of Minho in
Portugal state that these advanced fire prediction systems are not accessible to every firefighting department, and have high costs.
They therefore proposed a data mining approach, using meteorological data, to predict the burned area of forest fires. The authors used
five different data mining techniques, including Support Vector Machines (SVM), Radom Forests and four feature selection setups
(using Fire Weather Index data and meteorological data), to predict burned areas caused by small fires, which are occurring more
frequently. Their final proposed solution was derived by a regression analysis which only required the input of four direct weather
inputs as well as the SVM approach. The major drawback of their analysis is that only small fires can be predicted with sufficient
accuracy. Another issue is that input variables like firefighting invention time and the vegetation of relevant area have been neglected.
The authors also admit that direct weather condition data would be preferable over accumulated historical data to feed the models.
UNDERSTANDING WILDFIRES
The first step we took to trying to confirm our understanding of what weather data affects the ignition and spread of wildfire is to load
both the fire data and weather data into Tableau. We started by looking for correlations between the various input variables and the
output variable (total acres burned). We used scatter plots to examine variables we expected to have strong relationships and zoomed
into the graphs from 0 to 1000 acres burned because there are some very high outliers around 14,000 acres burned that skew the graph.
CORRELATION OF TEMPERATURE AND TOTAL ACRES BURNED
The scatterplots show a relationship between daily temperature in Fahrenheit and the total area burned on a particular day. The most
interesting observation was that the maximum daily has the strongest relationship to total acres burned. The trend line linear regression
equation explained that the total acres burned increase by 2.88 acres when the maximum temperature increases by 1 degree Fahrenheit
with a p-value of 0.068. This puts the relationship just outside of statistical significance for an alpha of five percent.
CORRELATION OF WIND SPEED AND TOTAL ACRES BURNED
As expected there is a positive correlation between high wind speeds and wildfire spread. The scatterplots show that we obtain the
highest correlation between maximum daily wind speed and the total area burned on a particular day as opposed to mean wind speed
or max gust speed. The linear regression trend line for acres burned and maximum daily wind speed is statistically significant with a
p-value of 0.0025. The increase in maximum wind speed by one mile per hour increases the area burned by an average of 13.64 acres.
The full trend line equation is [Total Acres Burned] = 13.64 * [Max Wind Speed MPH] – 83.68. We anticipated that the maximum
wind speed would have a higher correlation to wildfire spread than the maximum gust speed because the duration of a gust is usually
less than 20 seconds with the variation in wind speed between the peaks and lulls needing to be at least 9 knots. The maximum wind
speed, however, is not limited in its duration. That means the maximum wind speed is more likely to be reached more consistently,
and for longer periods of time.
CORRELATION OF PRECIPITATION AND TOTAL ACRES BURNED
We predicted that the likelihood of wildfire would be low if there was
no or very little precipitation. By drilling into the data we were able
to determine that all large fires, burning over 20,000 acres of land,
started when the precipitation during that particular day was zero. In
order to get a better sense of how many fires may have started with
precipitation we decided to create grouped bins. We have five bins
with the first one showing no precipitation at all where there were
6754 wildfires. If there was only a trace of precipitation, meaning less
than 0.05 inches of rain, we already see a decrease of over 90% in the
number of wild fires at 636. If there is less than 0.5 but more than
0.05 inches of rain, we see that 1970 fires contributed to the loss of
land during the relevant time period.
CORRELATION OF HUMIDITY AND TOTAL ACRES BURNED
While precipitation is defined as the condensation of water vapor in the air in the form of water droplets and ice falling on the ground,
humidity is the amount of water vapor in the air. We anticipated a negative relationship between humidity and the spread of wildfire.
No matter how strong the maximum wind speed or the maximum gust speed is, if the amount of water in the air is significant, the
conditions for fire ignition and growth will become far less likely. If you consider some of the south-east Asian countries, you will
often find a humidity level of 90-100% and very high temperatures, but rarely any large scale fires during the same period. Thus,
when looking at the correlations between mean humidity and total acres burned as well as maximum humidity and total acres burned,
we see a negative relationship that is statistically significant with p-values of <0.0001 in both cases. An increase of one percent in
average daily humidity will decrease the average acres burned by 6.58. The trend line equation is [Total Acres Burned] = -6.58 *
[Mean Humidity %] + 536.
CORRELATION OF POPULATION AND TOTAL ACRES BURNED
One of the more interesting relationships we wanted to examine is would
a higher population generally correlate to bigger wildfire spread or less
spread? We believe that more population means there are more
opportunities to catch a wildfire before it gets too big and more human
risks for not stopping it early. We found there is a negative correlation
between the population of the county and the total acres burned as we
expected with a p-value of 0.00036. According to the trend line equation
we can state that an increase of 10,000 residents will cause the total area
burned to decrease by 8.78 acres with the equation [Total Acres Burned]
= -0.000878 * [Population] + 237.124). It is also possible that counties
with more residents have access to more advanced technology to detect
wildfires earlier. After seeing a strong relationship we decided to include
population in our predictive models that are discussed later.
GENERAL CAUSES FOR WILDFIRES
The most impactful cause for wildfires is lightning. Nearly 90 percent of the total acres burned by wildfire were due to a wildfire
caused by lightning. The graph above shows the general causes of wildfire resulting in burned acreage from 2005 through 2014.
Recreationalists and Equipment use follow with 4.65 and 2.20 percent respectively. Interestingly, arson is responsible for only 1.22
percent of all burned acreage.
When scrutinizing the causes of the fires and the total acres burned as well as the respective months, it becomes apparent that wildfires
(predominantly caused by lightning strikes) are particularly dangerous during the summer months of June, July, and August. During
the time period from 2005 to 2015 only three fires burning over 5000 acres started outside of the summer months. In January and
February 2009 as well as January 2010, there were some extremely large burning fires caused by lightning, burning approximately
50,000 acres. The graph below shows the largest fires in the past ten years. Arson, equipment use, recreationists, and miscellaneous
are responsible for 6 major fires (>5000 acres burned), whereas lightning alone was the cause for 19 major fires.
FUEL MODEL
Another contributor to the spread of wildfire is the fuel model. The fuel model consists of the specific type of vegetation, density, and
its condition. When evaluating the graph you can see that “open pine, grass under” acts as the relevant fuel model in nearly 41 percent
of all cases when lightning is the cause. Unfortunately we do not have data for the most common fuel model, so we cannot say
whether one fuel model is disproportionately involved in the spread of wildfire although our expectations say it should play a major
factor.
WILDFIRES BY COUNTY
The county map shows that there are four counties that
have lost over 100,000 acres to fires over the past ten
years. The county of Wallowa, located in the far
northeast of Oregon, has to surrendered 277,984 acres to
wildfires. The counties of Grant, Harney, and Douglas
follow with over 100,000 acres each. Based on our
previous findings we would suspect these counties to
have some combination of lower humidity levels, higher
temperatures, higher wind speeds, and lower populations
that correlate to the higher number of acres burned due to
wildfire.
THE DASHBOARD
A Tableau dashboard was constructed that shows the five most important outputs of the analysis. Firstly, we are showing a map of the
relevant area (Oregon in our case) and the regions with the most acres burned. This would allow the authorities to allocate more
resources to those regions knowing the additional risks, thereby reducing the time to start combating wildfires. Second, when
considering the weather forecast, the authorities should also keep an eye on the mean and maximum humidity during the relevant time
period. As mentioned before, the correlation between humidity and total acres burned is very high. Finally, the dashboard shows that
regions in Oregon harboring pines, conifers, and grasses are especially threatened by wildfires once lightning has hit those areas. This
is most likely to happen during the second and third quarter of the year since lightning induced wildfires strike during those time
periods.
PREDICTING WILDFIRE STARTS
We would like to predict whether a wildfire is likely to occur using cheaply available weather data. We want the model to generally
favor generating false positives rather than a false negative. In other words, we would rather predict a wildfire and be wrong, than be
unable to predict a wildfire that may likely occur. This would allow for risk assessment for more opportunities. Ideally the model for
classifying whether a wildfire event will occur could be combined later with a model to predict the size and spread of the wildfire to
determine the level of risk and preparedness required.
DATA TRANSFORMATIONS
In order to predict the occurrence of wildfires we must use the combined WeatherWithFires data set so we have access to daily
weather information by area. For more detailed information on the data sets and the transformations required see the Appendix. We
remapped the variable representing total acres burned into a binary value where 1 indicates a wildfire started, and 0 indicates no
wildfire started for each day of weather data available for each location. Because we have a binary value we can use classification
models and algorithms such as logistic regression using selected weather and demographic fields we have available in the data set as
independent predictors.
The original data set WeatherWithFires contains weather data for 36 Oregon counties with data as far back as January 2005 and as
recent as December 2014, which resulted in 128,254 rows of daily weather data. Unfortunately the version of XLMiner in use only
supports 65,000 rows of data. In order to reduce the data the date range was filtered down to February 2010 through December 2014,
which reduced the data to 64,590 rows.
When partitioning the data into a training and validation data set we ran into another limitation of XLMiner. The tool would only
allow a maximum of 10,000 rows to be used for the training data set. Because of this we used the automatic percentages that XLMiner
calculated of 15.4823% of rows for training and 84.5177% for validation. This resulted in the maximum 10,000 rows to be used in the
training set.
MODEL DESCRIPTION
We started the logistic regression model development process by including all possible predictors and then used a combination of
intuition, independent variable p-value strength, avoidance of highly correlated independent variables, and the desire to use the most
easily accessible variables for prediction purposes in building the model. In general for variable types such as temperature, humidity,
wind speed, etc. we must pick either the minimum, mean, or maximum, rather than all three values since they tend to be very highly
correlated to one another. For each variable included in the model we discuss our interpretation of the relationship below:
Data Field Coefficient P-Value Odds Justification MaxTemperatureF 0.03972 1.52e-50 1.0405 There are temperature requirements that must be met for
ignition of a wildfire.
MeanWindSpeedMPH -0.05376 3.48e-05 0.9477 Higher minimum wind speeds seem like they might lower
the likelihood of fires starting even though they would
increase the probability of spreading.
PrecipitationIn -0.2517 0.390 0.7775 More precipitation is assumed to lower the likelihood of fire
starting, independent of lightning probability.
ThunderstormFlag 1.4255 1.14e-18 4.1601 We expect the chance of thunderstorms to significantly
increase the chance of wildfire because such a high
percentage of wildfires are started by lightning.
FogFlag 0.3752 0.0075 1.4553 Fog was a surprise!
Population 1.0042e-06 6.83e-05 1.000001 We included population as we would expect more people
increases the opportunities for human caused wildfires.
Area 7.2459e-05 5.64e-06 1.000072 We included area to account for the size of risk areas. We
would expect that larger areas would have a higher risk of
wildfire independent of all other factors.
The only variable in the model that is not statistically significant is the precipitation value. The humidity values were also not
statistically significant when they were included but we kept precipitation because it is easier to get from a weather forecast. Most of
the relationships met expectations with the exception of the presence of fog. According to the model there is a strong relationship of
increased likelihood of a wildfire when there is fog!
The biggest contributor to increasing wildfire start risk appears to be thunderstorms, which makes sense since the most popular cause
of wildfire is lightning strike. The odds of a wildfire go up by 316% with a thunderstorm. Every additional degree Fahrenheit of
maximum daily temperature increase the risk of wildfire by four percent.
MODEL TUNING
The initial modeling attempts used a default cut-off value set to 0.5 and
resulted in overall prediction error of only 6.4% using the validation data.
However the prediction had a 97.8% error rate for predicting a wildfire event
assuming one actually occurred. This means the model would only predict
2.2% of the wildfire events!
In order to make the model useful we need it to be better at predicting when a
wildfire event will occur, even if doing so increases the false positive rate (the
error rate at which the model predicts a wildfire but one does not occur). A
high rate of false alarms could result in wasted resources, but missed chances to be prepared will have the higher cost and risk.
We tried several different values and settled on 0.1 which results in 52.7% chance of not predicting a wildfire that actually occurs. The
cost of being able to correctly predict a wildfire when one is likely to occur is we must accept a high false positive or false alarm rate.
There is an 18.7% chance that when we predict a wildfire start, there will be no wildfire start.
MODEL EVALUATION
Using a cut-off value of 0.1 the model appears to not be overfitting the training data. The training data reported 20.90% error
compared to 20.94% error on the validation data. The breakdown of the error numbers is contained in the appendix for further review.
The lift charts show that the model performs particularly well in the top ten percentile of predicted wildfire probabilities. The model is
particularly strong where it predicts with an output of 0.5 or higher. During those first 102 sorted validation observations the model is
at 11.6 lift, meaning it is performing eleven times better than a random model.
ALTERNATIVE METHODS
The classification tree data mining algorithm was attempted against all possible variables. The resultant model was weaker than the
logistic regression model described above and used far fewer variables. The error was at 9.5% overall, with 70.9% chance of not
predicting a wildfire when there is one, and a lower 5.2% chance of predicting a wildfire when there is none. The model results, error
rates, and lift charts are listed in the appendix. The model reverted back to being less successful at prediction when wildfires would
occur but better at not generating false positives.
PREDICTING WILDFIRE IMPACT
Now that we have found the probabilities of wildfire based on the weather data, we turn our attention to the impact of a wildfire in
terms of how far it will spread.
MODEL DEVELOPMENT
In order to predict the area of land burned by the wildfire, we use the FireWithWeather dataset. This dataset is different from the
WeatherWithFire dataset as it only contains weather data corresponding to the counties when a fire was reported.
In order to determine which variable to use, we relied on two factors:
1. Correlation using Excel and visual relationships observed in Tableau.
2. Intuition of theoretical relationship, for example, when there is precipitation the wildfire spread should be less.
Data Field Data Type Expectation
Max Temperature Quantitative High temperatures will allow conditions for continued burning.
Max Humidity Quantitative Higher humidity should decrease the spread of wildfire due to more
moisture in the air.
Mean Humidity Quantitative Higher humidity should decrease the spread of wildfire due to more
moisture in the air.
Max Wind Speed MPH Quantitative Higher wind speeds should increase the chance of wildfire spread.
Precipitation In Quantitative Precipitation should decrease the chance of wildfire spread.
Max Gust Speed MPH Quantitative Higher wind speeds should increase the chance of wildfire spread.
Rain Flag Binary Rain should decrease the chance of wildfire spread.
Thunderstorm Flag Binary Thunderstorms should increase the chance of large wildfires because
multiple smaller fires may be more likely to join.
Fog Flag Binary Fog should decrease the chance of wildfire spread.
Snow Flag Binary Snow should decrease the chance of wildfire spread.
Hail Flag Binary Hail should decrease the chance of wildfire spread.
Population Quantitative Higher population should decrease the opportunity for wildfires to
grow due to the impact and likelihood of getting noticed.
Fuel Model Categorical (Dummy
Variable Used)
There will likely be some fuel models that have a higher correlation to
wildfire spread based on flammability.
The weather data gathered had missing values in some records. These records were not significant in number compared to the total
number of records (over 10,000). Thus, in order to prevent any distortion of results, we chose to delete the records with missing values
Training Data Scoring - Summary Report
Total sum
of
squared
errors RMS Error
Average
Error
3504824 25.1546021 -6.56793E-16
Validation Data Scoring - Summary Report
Total sum
of
squared
errors RMS Error
Average
Error
2541204 26.23547353 0.011335704
of relevant variables. As a result we came down to having 9231 records. This is substantial number of records for the 13 variables (26
including all the dummy variables).
The data was the partitioned after selecting just the relevant variables and creating dummies. The partition was made into 60% training
set and 40% validation set. We decided to use the multiple regression model to find the spread of fire based on the weather data
available.
The initial model attempt included all of the variables above including those that were highly correlated with one another such as
maximum daily humidity and mean daily humidity. The initial model results are listed in the appendix. Not only were some variables
highly correlated, but when we visually looked at the total acres burned there were some very large outliers for the dependent variable.
We were concerned that these very large values were skewing the model significantly since it was trying to minimize the sum of
squared errors, which is significantly influenced by outliers. By removing 94 outliers (possible errors) we were able to improve the R
squared from 0.007 to 0.02 and improved several of the predictor p-values.
After removing the highly correlated mean humidity and gust speed variables, the final model appeared:
Even though our errors have gone up marginally, we now have 10 independent variables which all have p-values less than 0.2.
MODEL EVALUATION
The sum of squared errors is better for the validation data than for the training data, and the RMS error is in the same ballpark. From
the regression measures it appears the model is not overfitting the training data. The lift chart shows that the model is performing
better than an average model. (Other error metrics are presented in the appendix)
Input
VariablesCoefficient P-Value
Residual DF 5515
Intercept 14.26904 0.000474 R² 0.024966
MaxTemperatureF 0.023542 0.364864 Adjusted R² 0.020899
MaxHumidity -0.18363 1.47E-09 Std. Error Estimate25.20928
MaxWind SpeedMPH 0.141422 0.022753 RSS 3504824
PrecipitationIn -0.7239 0.695158
RainFlag 0.878086 0.340045
ThunderstormFlag -1.67444 0.229963
FogFlag 2.855736 0.016898
SnowFlag 1.853758 0.441588
HailFlag -2.938 0.86946
Population -1.2E-05 0.000648
Fuel Model_A 2.758355 0.08322
Fuel Model_B 2.884547 0.504479
Fuel Model_C 2.612847 0.106094
Fuel Model_F 2.466192 0.186664
Fuel Model_G -0.5031 0.818683
Fuel Model_H 1.782308 0.274225
Fuel Model_I 1.6966 0.498202
Fuel Model_J 8.038211 3.73E-05
Fuel Model_K 0.262648 0.910736
Fuel Model_L 2.369435 0.156805
Fuel Model_R -1.10268 0.660198
Fuel Model_T 16.25123 2.57E-07
Fuel Model_U 1.221496 0.734262
The ROC graph has very small area over the curve, and hence shows our model’s effectiveness.
CONCLUSIONS
Nearly 80% of the total acres burned from Oregon wildfires have occurred between June and August from 2005 to 2014.
40% of the overall acres burned were from fires started in July.
Lightning is the cause for almost 90% of the total acres burned from Oregon wildfires.
The chance of wildfire increases with higher temperature, lower wind speeds, thunderstorms, fog, and in larger areas with
higher population.
The impact of a wildfire increases with lower humidity, higher wind speeds, fog, and in areas with lower population. Several
fuel models have a statistically significant impact as well.
Predicting the occurrence and impact of wildfire is difficult. Even with the right conditions present where the models indicate
very high probability we see cases where no wildfire occurs. Because of this we have adjusted our classification model to be
more likely to give false positives rather than be unable to predict a wildfire when does actually occur.
14
15
16
17
18
19
20
21
22
23
24
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
0 1000 2000 3000 4000
Cu
mu
lati
ve
# Cases
Lift chart (validation dataset)
Cumulative TotalAcres when sorted
using predicted values
Cumulative Total
Acres using average
0
0.5
1
1.5
2
2.5
3
3.5
1 2 3 4 5 6 7 8 9 10
De
cile
me
an /
Glo
bal
me
an
Deciles
Decile-wise lift chart (validation dataset)
Series1
APPENDIX
DATA PREPARATION
Data Source - Wildfires
Wildfires List from Oregon Department of Forestry,
http://www.odf.state.or.us/DIVISIONS/protection/fire_protection/fires/FIRESlist.asp
Acquisition of Data
In order to collect the full Oregon data set the team had to use a web to specify a Region as well as a year range from 1960 to 2015.
The web form would only return a maximum of 2500 rows, so we could not simply specify All Areas as the region and then the full
time range from 1960 to 2015 as the number of actual rows appeared to exceed 2500.
Instead we selected All Areas and then manually changed the yearly range in increments of 2-5 years depending on how much data
was returned, ensuring each time that less than 2500 rows were present, otherwise we broke down the yearly range into smaller
buckets to ensure that we captured all of the data. We combined the results into a single Excel spreadsheet called FireData.xlsx.
Data Definition
Data Field Description
Fire Year Year that wildfire is associated to
District ODF district associated to start of wildfire
Unit ODF unit associated to start of wildfire
Fire Number ODF fire identifier assigned to wildfire event
Fire Name Label applied to wildfire event
Legal Code used for location
Latitude Geographic latitude in degrees, minutes, and seconds of wildfire start
Longitude Geographic longitude in degrees, minutes, and seconds of wildfire start
Fuel Model Coded type of vegetation fuel for wildfire:
A=Annual grasses (cheat)
B=Dense Chaparral
C=Open pine, grass under
F=Dense Brush (lighter than B)
G=Conifer, Old growth
H=Conifer, Second growth
I=Slash, heavy
J=Slash, medium
K=Slash, thinning, P.C, Scattrd
L=Grass Perennial
R=Hardwood, summer
T=Sagebrush, medium dense
U=Closed canopy pine
X=Non wildland fuel
County Oregon county where wildfire event started
Report Date Date and time that wildfire event was reported
General Cause Assigned causal code for wildfire event:
Arson
Debris Burning
Equipment Use
Juveniles
Lightning
Miscellaneous
Railroad
Recreationist
Smoking
Under Invest
ODF Acres Quantity of acres within ODF jurisdiction affected by wildfire.
Total Acres Quantity of acres affected by wildfire
Data Transformation
Not applicable yet.
Data Source - Weather
Daily Weather Conditions, http://www.wunderground.com
Acquisition of Data
The Weather Underground website provides the capability to download a comma-delimited file (CSV) listing several historical
weather factors for a given airport and date range of up to one year. The team wanted to select an airport near a city where the weather
conditions could be used to represent the county. It was a manual and time-consuming task to look up 29 airports (representing 36
cities which represented 36 counties) and for each airport to consider looking up and appending daily weather data for 2005 through
2014, one year at a time. Instead a PowerShell script was developed (GetWeatherData.ps1) that would iterate over the required
airports and the desired years of data to collect the daily weather data into a single file from Weather Underground.
Data Definition
Data Field Description
WeatherDate The date the historical weather conditions
MaxTemperatureF Maximum daily temperature in degrees Fahrenheit
MeanTemperatureF Average daily temperature in degrees Fahrenheit
MinTemperatureF Minimum daily temperature in degrees Fahrenheit
MaxDewPointF Maximum daily dew point in degrees Fahrenheit
MeanDewPointF Average daily dew point in degrees Fahrenheit
MinDewPointF Minimum daily dew point in degrees Fahrenheit
MaxHumidity Maximum daily humidity percentage
MeanHumidity Average daily humidity percentage
MinHumidity Minimum daily humidity percentage
MaxSeaLevelPressureIn Maximum daily barometric pressure in inches of mercury
MeanSeaLevelPressureIn Average daily barometric pressure in inches of mercury
MinSeaLevelPressureIn Minimum daily barometric pressure in inches of mercury
MaxVisibilityMiles Maximum daily visibility in miles
MeanVisibilityMiles Average daily visibility in miles
MinVisibilityMiles Minimum daily visibility in miles
MaxWindSpeedMPH Maximum daily wind speed in miles per hour
MeanWindSpeedMPH Average daily wind speed in miles per hour
MinWindSpeedMPH Minimum daily wind speed in miles per hour
MaxGustSpeedMPH Maximum daily gust speed in miles per hour
PrecipitationIn Accumulation of precipitation for the day in inches.
T= trace of precipitation, <0.01 inches
CloudCover Daily measure of cloud cover in oktas
Events Dash-delimited list of weather events. The possibilities are:
Fog
Hail
Rain
Snow
Thunderstorm
Tornado
WindDirDegress Wind direction
Airport Airport code for where weather conditions were observed
Data Characteristics
Several airports selected did not have data from 2005 to 2014. In these cases rows were not available and were not synthesized for
those locations for those particular days.
Data Transformation
The non-numeric values that appeared in PrecipitationIn as ‘T’s were converted to 0.005 to estimate a value less than 0.01 inches of
precipitation.
Columns were created called FogFlag, HailFlag, RainFlag, SnowFlag, ThunderstormFlag, and TornadoFlag to enumerate these true of
false values from the Events field. If the Events field contained the substring “Fog” then FogFlag would be set to 1, otherwise it would
be set to 0. The same is true for Hail, Rain, Snow, Thunderstorm, and Tornado substrings for the associated flag fields.
268 rows contained blank temperature values. Most of these records were also missing other data fields such as wind speed, dew
point, pressure, and more. Any rows that were missing data were removed. We also looked at the list of values in each field and
removed any nonsensical values. An example was CloudCover should range from 0 to 8 to be valid oktas, but one row contained a
value of -902.
Data Source – Location Demographics
Oregon Counties
Acquisition of Data
Because there are only thirty-six counties in Oregon a spreadsheet was generated manually to map each Oregon County to a county
seat and to an airport where daily historical weather data could be pulled. This table is used to connect the wildfire data that is
organized by county to the daily weather data that is tied to airport. The table also contains some descriptive information such as
population and area for each county.
The airport codes were determined by looking up the county seat in http://www.wunderground.com and navigating to the historical
weather data feature to find out where that data is pulled from.
Data Definition
Data Field Description
County Oregon County name
City City name of the county seat
Population Population in quantity of people
Area Size of county in square miles
Airport Airport code for weather station near city observing historical weather data
Flattening the Data
In order to combine the data into a single table for the purpose of predictive analytics a Microsoft Access file was created with links to
the three main data sources, FireData.xlsx, WeatherByAirport.xlsx, and OregonCounties.xlsx. Queries were generated with Microsoft
Access and used to produce two new flattened files combining the data:
FiresByWeather.xlsx – This file takes each wildfire event and brings in the weather for the particular day and location. The data will
be useful for predicting the number of acres burned as it contains a row for each wildfire. It does not contain data pertaining to days
where no wildfires occurred.
WeatherWithFires.xlsx – This file starts with each day that we have weather data available for and then joins in the cases where
wildfires were started. So this data contains many rows where weather data was recorded for a particular location and a particular day
where no wildfire data is present. This data can be used to classify the likelihood of wildfire events by using the daily weather data.
CORRELATION MATRIX
LOGISTIC REGRESSION
Wea
ther
Yea
r
Ma
xTem
per
atu
reF
Mea
nTe
mp
era
ture
F
Min
Tem
per
atu
reF
Ma
xDew
Po
intF
Mea
nD
ewP
oin
tF
Min
Dew
po
intF
Ma
xHu
mid
ity
Mea
nH
um
idit
y
Min
Hu
mid
ity
Ma
xSea
Leve
lPre
ssu
reIn
Mea
nSe
aLe
velP
ress
ure
In
Min
Sea
Leve
lPre
ssu
reIn
Ma
xVis
ibili
tyM
iles
Mea
nV
isib
ility
Mile
s
Min
Vis
ibili
tyM
iles
Ma
xWin
dSp
eed
MP
H
Mea
nW
ind
Spee
dM
PH
Ma
xGu
stSp
eed
MP
H
Pre
cip
ita
tio
nIn
Clo
ud
Co
ver
Win
dD
irD
egre
es
Ma
xGu
stSp
eed
MP
HA
dj
Ra
inFl
ag
Thu
nd
erst
orm
Fla
g
Fog
Fla
g
Sno
wFl
ag
Ha
ilFla
g
Torn
ad
oFl
ag
Po
pu
lati
on
Are
a
Fire
Fla
g
Tota
l Acr
es S
cru
bb
ed
WeatherYear 1.000
MaxTemperatureF 0.043 1.000
MeanTemperatureF 0.036 0.955 1.000
MinTemperatureF 0.020 0.754 0.913 1.000
MaxDewPointF 0.018 0.659 0.792 0.856 1.000
MeanDewPointF 0.005 0.602 0.762 0.867 0.974 1.000
MinDewpointF -0.010 0.512 0.689 0.829 0.911 0.967 1.000
MaxHumidity -0.030 -0.434 -0.385 -0.266 0.131 0.169 0.193 1.000
MeanHumidity -0.028 -0.634 -0.497 -0.238 0.078 0.151 0.218 0.830 1.000
MinHumidity -0.031 -0.665 -0.489 -0.178 0.032 0.117 0.205 0.641 0.926 1.000
MaxSeaLevelPressureIn 0.085 -0.313 -0.377 -0.410 -0.353 -0.348 -0.320 0.121 0.122 0.102 1.000
MeanSeaLevelPressureIn 0.086 -0.243 -0.301 -0.338 -0.300 -0.287 -0.256 0.092 0.090 0.071 0.962 1.000
MinSeaLevelPressureIn 0.087 -0.178 -0.227 -0.261 -0.238 -0.220 -0.192 0.069 0.065 0.048 0.879 0.966 1.000
MaxVisibilityMiles -0.007 0.149 0.123 0.070 0.057 0.037 0.016 -0.100 -0.176 -0.210 -0.078 -0.075 -0.071 1.000
MeanVisibilityMiles -0.009 0.374 0.306 0.168 0.017 -0.008 -0.030 -0.410 -0.572 -0.574 -0.104 -0.072 -0.049 0.436 1.000
MinVisibilityMiles 0.001 0.405 0.326 0.171 -0.034 -0.047 -0.058 -0.494 -0.631 -0.599 -0.038 0.008 0.035 0.183 0.750 1.000
MaxWindSpeedMPH -0.002 0.075 0.105 0.127 0.038 0.013 -0.012 -0.209 -0.199 -0.145 -0.289 -0.355 -0.383 0.097 0.175 0.102 1.000
MeanWindSpeedMPH 0.004 0.007 0.076 0.161 -0.010 -0.010 -0.008 -0.291 -0.191 -0.080 -0.226 -0.265 -0.282 0.055 0.161 0.150 0.765 1.000
MaxGustSpeedMPH -0.032 -0.002 0.047 0.109 0.025 0.006 -0.010 -0.177 -0.104 -0.048 -0.271 -0.337 -0.366 0.062 0.098 0.029 0.876 0.677 1.000
PrecipitationIn -0.016 -0.170 -0.093 0.024 0.095 0.107 0.109 0.194 0.302 0.331 -0.175 -0.235 -0.262 -0.102 -0.289 -0.304 0.169 0.161 0.206 1.000
CloudCover -0.052 -0.554 -0.372 -0.070 0.038 0.105 0.167 0.463 0.704 0.761 -0.068 -0.111 -0.130 -0.130 -0.427 -0.506 0.016 0.027 0.062 0.349 1.000
WindDirDegrees 0.007 0.211 0.205 0.165 0.118 0.104 0.082 -0.164 -0.195 -0.192 -0.103 -0.089 -0.066 0.046 0.124 0.143 0.136 0.086 0.059 -0.072 -0.117 1.000
MaxGustSpeedMPHAdj -0.019 0.080 0.116 0.145 0.061 0.040 0.017 -0.185 -0.176 -0.121 -0.282 -0.344 -0.371 0.082 0.137 0.080 0.906 0.709 0.998 0.187 0.029 0.124 1.000
RainFlag -0.035 -0.233 -0.120 0.049 0.157 0.166 0.172 0.309 0.398 0.411 -0.182 -0.241 -0.267 -0.043 -0.231 -0.343 0.162 0.071 0.172 0.351 0.518 -0.063 0.173 1.000
ThunderstormFlag 0.014 0.133 0.143 0.134 0.123 0.102 0.077 -0.063 -0.080 -0.078 -0.111 -0.109 -0.096 0.014 0.045 -0.007 0.121 0.024 0.105 0.053 -0.033 0.035 0.116 0.122 1.000
FogFlag 0.017 -0.223 -0.209 -0.158 -0.023 -0.018 -0.009 0.313 0.368 0.323 0.110 0.097 0.080 -0.201 -0.686 -0.555 -0.175 -0.179 -0.156 0.060 0.199 -0.079 -0.141 0.083 -0.041 1.000
SnowFlag -0.027 -0.297 -0.305 -0.270 -0.248 -0.240 -0.230 0.119 0.160 0.164 -0.028 -0.082 -0.109 -0.067 -0.248 -0.253 0.040 0.033 0.045 0.093 0.184 -0.033 0.037 0.073 -0.018 0.176 1.000
HailFlag -0.005 -0.005 -0.004 -0.004 -0.005 -0.004 -0.003 0.005 0.002 -0.002 -0.001 -0.001 0.000 0.001 0.001 -0.007 0.007 0.003 0.004 0.002 0.005 0.003 0.005 0.009 0.032 -0.003 0.007 1.000
TornadoFlag -0.002 -0.001 0.001 0.003 0.003 0.003 0.003 -0.001 0.001 0.003 -0.001 -0.001 0.000 0.001 0.002 -0.003 -0.001 0.000 -0.003 0.001 0.006 0.000 -0.001 0.008 -0.001 -0.002 -0.001 0.000 1.000
Population -0.007 0.024 0.079 0.133 0.190 0.211 0.225 0.181 0.149 0.142 0.019 0.024 0.027 -0.004 -0.039 -0.049 -0.082 -0.081 -0.188 0.054 0.180 -0.066 -0.062 0.128 -0.028 0.050 -0.043 0.001 0.015 1.000
Area 0.007 0.017 -0.090 -0.231 -0.280 -0.324 -0.356 -0.187 -0.268 -0.281 0.010 -0.005 -0.016 0.026 0.110 0.109 0.077 0.003 -0.032 -0.098 -0.174 0.042 0.044 -0.116 0.046 -0.049 0.078 0.001 0.000 -0.229 1.000
FireFlag 0.054 0.172 0.167 0.135 0.114 0.100 0.075 -0.070 -0.108 -0.116 -0.050 -0.034 -0.018 0.023 0.032 0.038 0.000 -0.035 -0.030 -0.041 -0.097 0.016 0.004 -0.028 0.146 -0.002 -0.026 -0.002 -0.001 0.022 0.049 1.000
Total Acres Scrubbed 0.011 0.015 0.015 0.013 0.004 0.002 0.000 -0.018 -0.018 -0.014 -0.005 -0.004 -0.003 0.001 0.007 0.008 0.013 0.004 0.010 -0.004 -0.009 0.005 0.011 -0.002 0.022 0.000 -0.003 0.000 0.000 -0.005 0.007 0.059 1.000
Regression Model
Input
VariablesCoefficient Std. Error Chi2-Statistic P-Value Odds CI Lower CI Upper
Residual DF 9992
Intercept -5.38627515 0.222653298 585.2195221 2.7462E-129 0.004579 0.00296 0.007084 Residual Dev.4602.507
MaxTemperatureF 0.039721854 0.002656669 223.55486 1.51707E-50 1.040521 1.035117 1.045953 # Iterations Used 4
MeanWindSpeedMPH -0.05375577 0.012985249 17.13758304 3.47675E-05 0.947664 0.923849 0.972092 Multiple R² 0.088048
PrecipitationIn -0.25171563 0.29262455 0.739944018 0.389678812 0.777466 0.438126 1.379634
ThunderstormFlag 1.425532448 0.161620716 77.79651824 1.14221E-18 4.160072 3.030603 5.710481
FogFlag 0.375224145 0.140300518 7.152584657 0.007485603 1.455318 1.105436 1.91594
Population 1.10042E-06 2.76332E-07 15.85828215 6.82671E-05 1.000001 1.000001 1.000002
Area 7.24593E-05 1.59627E-05 20.60522102 5.6442E-06 1.000072 1.000041 1.000104
Training Data Scoring - Summary Report
0.1
Actual Class 1 0
1 342 353
0 1737 7568
Class # Cases # Errors % Error
1 695 353 50.79136691
0 9305 1737 18.66738313
Overall 10000 2090 20.9
1
0.164502165
0.492086331
0.813326169
0.246575342
Specificity
F1-Score
Predicted Class
Error Report
PerformanceSuccess Class
Precision
Recall (Sensitivity)
Cutoff probability value for success (UPDATABLE) Updating the value here will NOT update value in detailed report
Confusion Matrix
Validation Data Scoring - Summary Report
0.1
Actual Class 1 0
1 1678 1870
0 9562 41480
Class # Cases # Errors % Error
1 3548 1870 52.70574972
0 51042 9562 18.73359194
Overall 54590 11432 20.94156439
1
0.149288256
0.472942503
0.812664081
0.226940763F1-Score
Error Report
PerformanceSuccess Class
Precision
Recall (Sensitivity)
Specificity
Cutoff probability value for success (UPDATABLE) Updating the value here will NOT update value in detailed report
Confusion MatrixPredicted Class
CLASSIFICATION TREE
66.5
MeanTemper
0
276
2583
Area
513
0
291
8.547e+
Population
221
0
133
73.5
MeanTemper
882
0.08
PrecipitationI
522
1
360
0
494
1
28
Test Data scoring - Summary Report (Using Best Pruned Tree)
0.1
Actual Class 1 0
1 412 1006 0.70945
0 1069 19349 0.72181
Class # Cases # Errors % Error
1 1418 1006 70.94499
0 20418 1069 5.235576
Overall 21836 2075 9.502656
1
0.27819
0.29055
0.947644
0.284236
Specificity
F1-Score
Predicted Class
Error Report
PerformanceSuccess Class
Precision
Recall (Sensitivity)
Cutoff probability value for success (UPDATABLE) Updating the value here will NOT update value in detailed report
Confusion Matrix
MULTIPLE LINEAR REGRESSION
Initial model
Removed outliers
Input
VariablesCoefficient Std. Error t-Statistic P-Value CI Lower CI Upper
RSS
Reduction Residual DF 5568
Intercept 1236.561 544.8552495 2.269522234 0.023275 168.4322 2304.69 158090254 R² 0.007639959
MaxTemperatureF-4.572656 4.012047683 -1.13973123 0.254447 -12.4378 3.292523 41253014.86 Adjusted R² 0.003184319
MaxHumidity -3.17814 5.611288029 -0.56638335 0.571156 -14.1785 7.822174 109461933.3 Std. Error Estimate 3094.180509
MeanHumidity -9.098839 6.313910559 -1.44107817 0.149619 -21.4766 3.278889 30408733.58 RSS 53307770441
MaxWind SpeedMPH3.541634 15.97168029 0.221744611 0.824521 -27.7691 34.85236 10166551.77
PrecipitationIn 1.379731 49.20377299 0.02804116 0.97763 -95.0789 97.83832 11900.62063
MaxGustSpeedMPHAdj2.325006 10.65970782 0.218111636 0.82735 -18.5722 23.22219 636555.642
RainFlag -137.1917 113.0599868 -1.21344199 0.225012 -358.833 84.44995 3138592.002
ThunderstormFlag223.0495 166.0781577 1.343039106 0.179314 -102.529 548.6274 27155645.91
FogFlag 107.1989 149.7348341 0.715924808 0.474068 -186.34 400.7376 4858758.305
SnowFlag -53.17142 285.3094312 -0.18636404 0.852166 -612.489 506.1464 169706.098
HailFlag -51.59413 2191.314296 -0.02354483 0.981217 -4347.43 4244.237 25261.82324
Population -0.00054 0.000421497 -1.2805477 0.200406 -0.00137 0.000287 27480191.06
Fuel Model_A -23.20317 196.499996 -0.11808232 0.906007 -408.42 362.0135 7860546.175
Fuel Model_B -93.17159 524.7264323 -0.17756221 0.859073 -1121.84 935.4969 1062989.165
Fuel Model_C 138.5869 199.6299384 0.694218891 0.487574 -252.766 529.9394 4871955.574
Fuel Model_F 1.334528 232.2391453 0.005746354 0.995415 -453.945 456.6139 1121043.416
Fuel Model_G 786.8749 264.7020475 2.972681591 0.002965 267.9556 1305.794 131346533.8
Fuel Model_H -17.82082 199.1608045 -0.08947956 0.928704 -408.254 372.6121 394035.6074
Fuel Model_I -28.55546 311.929409 -0.09154464 0.927063 -640.059 582.9479 209555.0402
Fuel Model_J -24.4791 240.3829083 -0.10183378 0.918892 -495.723 446.7652 519846.5116
Fuel Model_K -80.02531 293.8633776 -0.27232147 0.785385 -656.112 496.0616 1831746.697
Fuel Model_L 94.73391 205.0273256 0.462055021 0.64406 -307.2 496.6675 5399746.683
Fuel Model_R -54.4641 305.9224255 -0.17803237 0.858704 -654.191 545.2632 74611.62428
Fuel Model_T -110.6278 374.9614261 -0.29503786 0.767976 -845.698 624.4429 697467.9889
Fuel Model_U -71.17254 442.4721067 -0.16085204 0.872216 -938.59 796.2454 247710.4992
Training Data Scoring - Summary Report
Total sum of
squared
errors RMS Error
Average
Error
53307770441 3086.982 2.01864E-12
Validation Data Scoring - Summary Report
Total sum of
squared
errors RMS Error
Average
Error
10292488889 1661.138 -77.8164336
Error Metrics:
CART (Prediction):
Input
VariablesCoefficient Std. Error t-Statistic P-Value CI Lower CI Upper
RSS
Reduction Residual DF 5513
Intercept 17.69732773 4.574001388 3.86911289 0.000110492 8.730481103 26.66417436 99409.75 R² 0.02351
MaxTemperatureF-0.033869112 0.033838333 -1.0009096 0.316914445 -0.10020559 0.032467365 1717.131 Adjusted R² 0.019082
MaxHumidity -0.163498766 0.048450746 -3.37453559 0.000744507 -0.258481336 -0.0685162 29929.83 Std. Error Estimate 26.07203
MeanHumidity -0.027774402 0.053782482 -0.51642099 0.605581121 -0.133209277 0.077660472 399.223 RSS 3747467
MaxWind SpeedMPH0.017029282 0.137809494 0.12357118 0.901659317 -0.253131677 0.28719024 6408.125
PrecipitationIn -0.514039049 1.291650056 -0.39797083 0.690667095 -3.046182563 2.018104464 171.5546
MaxGustSpeedMPHAdj0.117996193 0.093915521 1.2564078 0.20902139 -0.066115267 0.302107654 1252.731
RainFlag -0.006771199 0.961029838 -0.00704577 0.994378587 -1.890768694 1.877226297 171.5756
ThunderstormFlag -0.504207046 1.449487848 -0.34785186 0.727964719 -3.34577488 2.337360788 39.1471
FogFlag 2.451828004 1.287452173 1.904403173 0.056909697 -0.072086004 4.975742012 2831.44
SnowFlag 0.82393785 2.480077405 0.332222635 0.739733784 -4.037991963 5.685867664 239.324
HailFlag -1.093650238 18.46491974 -0.05922854 0.952772225 -37.29217517 35.1048747 13.69563
Population -9.42886E-06 3.47509E-06 -2.71326713 0.006683039 -1.62414E-05 -2.6163E-06 6331.904
Fuel Model_A 2.806015584 1.638480411 1.712571945 0.08684755 -0.406052209 6.018083377 0.505967
Fuel Model_B 0.419331057 4.461402773 0.093990854 0.925119838 -8.326777876 9.16543999 219.4555
Fuel Model_C 2.278022003 1.668843985 1.365029939 0.172299342 -0.99357037 5.549614377 703.4535
Fuel Model_F 2.917559919 1.95890901 1.489380009 0.136444532 -0.922674301 6.757794138 5.699657
Fuel Model_G 2.192067912 2.261219665 0.969418383 0.332379011 -2.240814417 6.624950241 169.9185
Fuel Model_H 1.916788103 1.666377499 1.150272434 0.250081573 -1.349968986 5.183545191 1689.793
Fuel Model_I 3.835796354 2.545833715 1.50669556 0.131946005 -1.155041759 8.826634467 8.491603
Fuel Model_J 9.170368758 2.003729171 4.576650823 4.82848E-06 5.242269348 13.09846817 12563.57
Fuel Model_K -0.021562087 2.398802742 -0.00898869 0.992828487 -4.724161506 4.681037332 1149.861
Fuel Model_L 2.953693022 1.715946481 1.721320014 0.085248871 -0.41023882 6.317624865 147.0082
Fuel Model_R -0.321406314 2.5067467 -0.12821651 0.897982311 -5.235618462 4.592805835 1528.575
Fuel Model_T 18.8621992 3.292289681 5.729203997 1.06246E-08 12.408013 25.31638539 22411.38
Fuel Model_U 1.587415804 3.742752857 0.424130544 0.67148716 -5.749855871 8.924687478 122.2782
Training Data Scoring - Summary Report
Total sum of
squared errors RMS Error
Average
Error
3747466.754 26.01077017 -5.38981E-14
Validation Data Scoring - Summary Report
Total sum of
squared errors RMS Error
Average
Error
2300878.365 24.96410405 -0.272124133
Average Error 0.011336
MAD 7.235846
MAPE 161.4117
MSE 688.3001
RMSE 26.23547
Test Data scoring - Summary Report (Using Best Pruned Tree)
Total sum
of
squared
errors RMS Error
Average
Error
844872.5 21.3934 -0.7446691
SOURCES
DATA SOURCES
1.) Daily data on wildfires in the 14 regions of Oregon (allows us to pick data from 1960-2015). It shows the fuel model of the
fire as well as the total acres burned. http://www.odf.state.or.us/DIVISIONS/protection/fire_protection/fires/FIRESlist.asp
2.) Monthly data on average degrees in Fahrenheit as well as monthly data on average precipitation in inches
http://www.usclimatedata.com/climate/tillamook/oregon/united-states/usor0347
3.) Monthly wind speed data for each district (this data set also provides monthly data on humidity, precipitation, and
temperature)
http://www.wunderground.com/history/airport/KTMK/2006/10/11/CustomHistory.html?dayend=11&monthend=10&yearend
=2015&req_city=&req_state=&req_statename=&reqdb.zip=&reqdb.magic=&reqdb.wmo=
4.) Fuel Moisture database (it provides bi-monthly data, unfortunately not for all districts, additional research will be necessary)
http://wfas.net/index.php/national-fuel-moisture-database-moisture-drought-103
REFERENCES
a) http://www.wno.org/forestfire
b) https://www.nwf.org/Wildlife/Threats-to-Wildlife/Global-Warming/Global-Warming-is-Causing-Extreme-
Weather/Wildfires.aspx
c) B. Arrue, A. Ollero, and J. Matinez de Dios. An Intelligent System for False Alarm Reduction in Infrared Forest-Fire
Detection. IEEE Intelligent Systems, 15(3):64–73, 2000.
d) J. Terradas J. Pinol and F. Lloret. Climate warming, wildfire hazard, and wildfire occurrence in coastal eastern Spain.
Climatic Change, 38:345–357, 1998.
e) http://www.kgw.com/story/news/local/2015/06/15/oregon-and-washington-wildfire-updates/71264920/