Upload
winfred-dixon
View
214
Download
0
Embed Size (px)
Citation preview
1
Validation and Implication of Segmentation on Empirical Bayes for Highway Safety
Studies
Reginald R. Souleyrette, Robert P. Haas and T. H. MazeIowa State University, SAIC and Iowa State University
ENVIRONMENTAL HEALTH RISK 2007 Fourth International Conference on The Impact of Environmental
Factors on Health MALTA; 27 - 29 June, 2007
4
Engineering studies
• Limited resources• Highest benefit desired
• High Crash Locations• Before and After Studies
• Small sample size high variance• Selection bias regression to the mean (RTM)
5
Objectives
• Validate the state of the art statistical approach, known as empirical Bayesian
• Demonstrate tradeoffs between model quality and data quantity
• Investigate effect of data aggregation
• … to improve identification and therefore mitigation of high crash locations
8
Statistical approaches we could take…
• Use long periods
• Use large number of locations
• Use Empirical Bayes (EB)– Substitutes “similar” locations for longer
observation time – “Weights” site and similar-site data
9
Mr. Smith
• Mr. Smith had no crashes last year• The average of similar drivers is 0.8 crashes per year• What do we expect is the number of crashes Mr. Smith will
have next year … 0?, 0.8? … • Answer … use both pieces of information and weight the
expectation
Hauer, E., D.W. Harwood, F.M. Council, M.S. Griffith, “The Empirical Bayes method for estimating safety: A tutorial.” Transportation Research Record 1784, pp. 126-131. National Academies Press, Washington, D.C.. 2002http://members.rogers.com/hauer/Pubs/TRBpaper.pdf
10
Empirical Bayes (EB)
• We have two types of information
• We compute an estimate which is an average of both
• How much to weight the two depends on…– Quantity– Quality
• Accepted practice… small scale
What should the weight be???
11
________1+(μ∙Y)/φ
1w =
mean # crashes/year from model
number of years
overdispersion factorweight applied to model estimate
EB estimate = w∙(model estimate) + (1-w)∙(site average)
Need: site dataNeed: - model for similar sites (neg. binomial)
12
Objective #1
Test effectiveness of EB by comparing:1. a single year of data from many locations,
with different models and the Empirical Bayes formula, vs.
2. several years of crash data at specific locations
20002001200220032004
2004
14
Description of Data
Roads (Iowa)– All (19,400km)– Freeways (1400km)– Multilane (8000km)– 2-lane (10,000km)
• Low ADT (1200 VPD)• Med ADT (2400 VPD)• High ADT (4400 VPD)
– Segments• 400m (short)• 4km (med)• 6.8km (long)
15
Description of Data
Intersections (California)– Multiphase (873)– Single Phase (374)– Thru-stop (3047)
• 5 years of data• large-scale validation
16
Analysis – IntersectionsThree model forms:
a) Crashes = α(mainline traffic)β,
b) Crashes = α(mainline traffic)β(cross street traffic)γ
c) Crashes = α (mainline traffic)β(cross street lanes)δ
Three types of intersections– multiphase signals– Single phase signals– Stop sign control
Intersection model
parameters and descriptive statistics
17
Multiphase Intersection Models2004 California Data
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
0 10,000 20,000 30,000 40,000
Average Daily Traffic
Ave
rage
Cra
shes
Per
Mile
Per
Yea
r
Traffic - Main
Traffic - Both
Traffic - Main +Cross Lanes
Thru-Stop Intersection Models2004 California Data
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
0 10,000 20,000 30,000 40,000
Average Daily Traffic
Ave
rage
Cra
shes
Per
Mile
Per
Yea
r
Example intersection crash models(only 2 dimensions shown)
18
2004 estimates using 03 data only
2003 2000-2003 EB model "a" EB model "b" EB model "c" EB model "d" 2004actual 4 yr avg. similar sites similar sites similar sites dissimilar sites actual
44 35.5 36.3 35.8 36.7 38.6 2833 37.3 27.4 28.2 27.8 28.9 2128 21.5 23.2 23.6 23.6 24.4 1928 19.3 23.7 24.2 24.0 25.0 2526 19.0 21.1 21.7 21.5 21.8 1824 17.5 20.8 21.3 20.4 22.1 1324 14.3 19.0 18.9 18.7 19.3 2324 23.0 21.4 21.6 21.7 22.8 2823 15.5 19.5 19.1 19.8 20.5 2523 19.0 19.5 20.0 19.8 20.5 15
41% 36% 26% 27% 26% 29% %RMSE
best estimatebest model estimate
Intersection ResultsTop 10 high crash locations in 2003*
* California HSIS Multiphase 4 leg
Not intuitive
Highest in 2003 Trying to
predict this
EB model “a” lowest error
4 year average “better” slightly more often than
EB
19
Using 4 years of data + EB2004 estimates using 00-03 data
2003 2000-2003 EB model "a" EB model "b" EB model "c" EB model "d" 2004actual 4 yr avg. similar sites similar sites similar sites dissimilar sites actual
44 35.5 33.7 33.6 33.9 34.3 2833 37.3 35.3 35.6 35.5 35.9 2128 21.5 20.5 20.6 20.6 20.8 1928 19.3 18.5 18.6 18.6 18.8 2526 19.0 18.0 18.2 18.1 18.2 1824 17.5 16.9 17.0 16.8 17.2 1324 14.3 13.49 13.46 13.42 13.50 2324 23.0 22.30 22.37 22.40 22.70 2823 15.5 14.93 14.84 14.99 15.11 2523 19.0 18.22 18.34 18.29 18.47 15
41% 36% 34% 34% 34% 35% %RMSE
best estimatebest model estimate
Now, EB better more often
Now, model “d” never best estimate, but still best model four times?
20
2003 2004 RANKS 2004actual 2000-2003 EB model "a" EB model "b" EB model "c" EB model "d" actualRANK 4 yr avg. similar sites similar sites similar sites dissimilar sites RANK
1 2 1 1 1 1 18 3 5 6 5 5 14 5 3 3 3 3 39 9 9 9 9 9 37 10 10 10 10 10 62 1 2 2 2 2 96 8 7 7 7 6 153 4 4 4 4 4 195 6 6 5 6 7 22
10 7 8 8 8 8 43
Intersection ResultsEffect on Ranking
EB does slightly better than 4 year average, or 2003 alone
all models “comparable”
21
Analysis – Roads
• crashes=α(length)(ADT)β
• 3 types of roads– Freeway– Multilane divided– 2-lane
• 3 segmentations– 0.4, 3.8, and 11.6 km, on average
• 3 traffic ranges (L,M,H)• 15 models
Road segment model
parameters and descriptive statistics
22
Effect of Segmentation on CorrectionFreeway-type segments
Longest segments Average length 11.6 km
Medium segmentsAverage length 3.8 km
Shortest segmentsAverage length 0.4 km
2003 2004 estimates 2004actual 2000-2003 EB model actual
crashes 4 yr avg. estimate crashes7 5.3 0.9 87 5.5 0.9 37 3.5 1.6 45 2.3 1.5 15 2.5 1.1 24 3.5 0.5 24 3.5 0.4 64 2.0 0.5 14 1.0 0.8 04 5.3 0.8 4
93% 54% 106% %RMSE
22 17.0 15.2 2113 12.0 7.5 25
8 4.5 5.3 46 5.0 2.1 26 4.0 3.6 25 4.0 1.5 04 3.3 1.6 23 1.5 0.3 13 2.0 1.6 23 2.0 0.8 1
80% 78% 98% RMSE
2003 2004 estimates 2004actual 2000-2003 EB model actual
crashes 4 yr avg. estimate crashes22 17.0 16.4 2113 12.0 8.1 25
5 4.0 1.6 015 16.8 12.8 2226 22.0 22.6 35
4 3.3 3.2 418 13.3 13.1 15
8 11.8 7.8 954 53.8 54.6 5818 12.8 17.6 16
28% 32% 37% RMSE
best estimate
Note higher EB correction for
short segments
23
Conclusions
• EB+1yr ≈ 4yrs of data• Better model did not necessarily improve
prediction (at least for the 10 intersections selected)
• Longer segment models are more accurate
• Intersection 4-year averages and models are relatively poor predictors– But when combined using EB, better