1 Validation and Implication of Segmentation on Empirical Bayes for Highway Safety Studies Reginald R. Souleyrette, Robert P. Haas and T. H. Maze Iowa

1

Validation and Implication of Segmentation on Empirical Bayes for Highway Safety

Studies

Reginald R. Souleyrette, Robert P. Haas and T. H. MazeIowa State University, SAIC and Iowa State University

ENVIRONMENTAL HEALTH RISK 2007 Fourth International Conference on The Impact of Environmental

Factors on Health MALTA; 27 - 29 June, 2007

http://www.saic.com/

2

The highway safety problem

Source: World Health Organization

3

Mitigation approaches – 4Es

• Education

• Enforcement

• Emergency Response

• Engineering

4

Engineering studies

• Limited resources• Highest benefit desired

• High Crash Locations• Before and After Studies

• Small sample size high variance• Selection bias regression to the mean (RTM)

5

Objectives

• Validate the state of the art statistical approach, known as empirical Bayesian

• Demonstrate tradeoffs between model quality and data quantity

• Investigate effect of data aggregation

• … to improve identification and therefore mitigation of high crash locations

6

7

8

Statistical approaches we could take…

• Use long periods

• Use large number of locations

• Use Empirical Bayes (EB)– Substitutes “similar” locations for longer

observation time – “Weights” site and similar-site data

9

Mr. Smith

• Mr. Smith had no crashes last year• The average of similar drivers is 0.8 crashes per year• What do we expect is the number of crashes Mr. Smith will

have next year … 0?, 0.8? … • Answer … use both pieces of information and weight the

expectation

Hauer, E., D.W. Harwood, F.M. Council, M.S. Griffith, “The Empirical Bayes method for estimating safety: A tutorial.” Transportation Research Record 1784, pp. 126-131. National Academies Press, Washington, D.C.. 2002http://members.rogers.com/hauer/Pubs/TRBpaper.pdf

10

Empirical Bayes (EB)

• We have two types of information

• We compute an estimate which is an average of both

• How much to weight the two depends on…– Quantity– Quality

• Accepted practice… small scale

What should the weight be???

11

________1+(μ∙Y)/φ

1w =

mean # crashes/year from model

number of years

overdispersion factorweight applied to model estimate

EB estimate = w∙(model estimate) + (1-w)∙(site average)

Need: site dataNeed: - model for similar sites (neg. binomial)

12

Objective #1

Test effectiveness of EB by comparing:1. a single year of data from many locations,

with different models and the Empirical Bayes formula, vs.

2. several years of crash data at specific locations

20002001200220032004

2004

13

Objective #2

explore the relationship between segmentation and accuracy of estimates

14

Description of Data

Roads (Iowa)– All (19,400km)– Freeways (1400km)– Multilane (8000km)– 2-lane (10,000km)

• Low ADT (1200 VPD)• Med ADT (2400 VPD)• High ADT (4400 VPD)

– Segments• 400m (short)• 4km (med)• 6.8km (long)

15

Description of Data

Intersections (California)– Multiphase (873)– Single Phase (374)– Thru-stop (3047)

• 5 years of data• large-scale validation

16

Analysis – IntersectionsThree model forms:

a) Crashes = α(mainline traffic)β,

b) Crashes = α(mainline traffic)β(cross street traffic)γ

c) Crashes = α (mainline traffic)β(cross street lanes)δ

Three types of intersections– multiphase signals– Single phase signals– Stop sign control

Intersection model

parameters and descriptive statistics

17

Multiphase Intersection Models2004 California Data

0.0

1.0

2.0

3.0

4.0

5.0

6.0

7.0

0 10,000 20,000 30,000 40,000

Average Daily Traffic

Ave

rage

Cra

shes

Per

Mile

Per

Yea

r

Traffic - Main

Traffic - Both

Traffic - Main +Cross Lanes

Thru-Stop Intersection Models2004 California Data

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

0 10,000 20,000 30,000 40,000

Average Daily Traffic

Ave

rage

Cra

shes

Per

Mile

Per

Yea

r

Example intersection crash models(only 2 dimensions shown)

18

2004 estimates using 03 data only

2003 2000-2003 EB model "a" EB model "b" EB model "c" EB model "d" 2004actual 4 yr avg. similar sites similar sites similar sites dissimilar sites actual

44 35.5 36.3 35.8 36.7 38.6 2833 37.3 27.4 28.2 27.8 28.9 2128 21.5 23.2 23.6 23.6 24.4 1928 19.3 23.7 24.2 24.0 25.0 2526 19.0 21.1 21.7 21.5 21.8 1824 17.5 20.8 21.3 20.4 22.1 1324 14.3 19.0 18.9 18.7 19.3 2324 23.0 21.4 21.6 21.7 22.8 2823 15.5 19.5 19.1 19.8 20.5 2523 19.0 19.5 20.0 19.8 20.5 15

41% 36% 26% 27% 26% 29% %RMSE

best estimatebest model estimate

Intersection ResultsTop 10 high crash locations in 2003*

* California HSIS Multiphase 4 leg

Not intuitive

Highest in 2003 Trying to

predict this

EB model “a” lowest error

4 year average “better” slightly more often than

EB

19

Using 4 years of data + EB2004 estimates using 00-03 data

2003 2000-2003 EB model "a" EB model "b" EB model "c" EB model "d" 2004actual 4 yr avg. similar sites similar sites similar sites dissimilar sites actual

44 35.5 33.7 33.6 33.9 34.3 2833 37.3 35.3 35.6 35.5 35.9 2128 21.5 20.5 20.6 20.6 20.8 1928 19.3 18.5 18.6 18.6 18.8 2526 19.0 18.0 18.2 18.1 18.2 1824 17.5 16.9 17.0 16.8 17.2 1324 14.3 13.49 13.46 13.42 13.50 2324 23.0 22.30 22.37 22.40 22.70 2823 15.5 14.93 14.84 14.99 15.11 2523 19.0 18.22 18.34 18.29 18.47 15

41% 36% 34% 34% 34% 35% %RMSE

best estimatebest model estimate

Now, EB better more often

Now, model “d” never best estimate, but still best model four times?

20

2003 2004 RANKS 2004actual 2000-2003 EB model "a" EB model "b" EB model "c" EB model "d" actualRANK 4 yr avg. similar sites similar sites similar sites dissimilar sites RANK

1 2 1 1 1 1 18 3 5 6 5 5 14 5 3 3 3 3 39 9 9 9 9 9 37 10 10 10 10 10 62 1 2 2 2 2 96 8 7 7 7 6 153 4 4 4 4 4 195 6 6 5 6 7 22

10 7 8 8 8 8 43

Intersection ResultsEffect on Ranking

EB does slightly better than 4 year average, or 2003 alone

all models “comparable”

21

Analysis – Roads

• crashes=α(length)(ADT)β

• 3 types of roads– Freeway– Multilane divided– 2-lane

• 3 segmentations– 0.4, 3.8, and 11.6 km, on average

• 3 traffic ranges (L,M,H)• 15 models

Road segment model

parameters and descriptive statistics

22

Effect of Segmentation on CorrectionFreeway-type segments

Longest segments Average length 11.6 km

Medium segmentsAverage length 3.8 km

Shortest segmentsAverage length 0.4 km

2003 2004 estimates 2004actual 2000-2003 EB model actual

crashes 4 yr avg. estimate crashes7 5.3 0.9 87 5.5 0.9 37 3.5 1.6 45 2.3 1.5 15 2.5 1.1 24 3.5 0.5 24 3.5 0.4 64 2.0 0.5 14 1.0 0.8 04 5.3 0.8 4

93% 54% 106% %RMSE

22 17.0 15.2 2113 12.0 7.5 25

8 4.5 5.3 46 5.0 2.1 26 4.0 3.6 25 4.0 1.5 04 3.3 1.6 23 1.5 0.3 13 2.0 1.6 23 2.0 0.8 1

80% 78% 98% RMSE

2003 2004 estimates 2004actual 2000-2003 EB model actual

crashes 4 yr avg. estimate crashes22 17.0 16.4 2113 12.0 8.1 25

5 4.0 1.6 015 16.8 12.8 2226 22.0 22.6 35

4 3.3 3.2 418 13.3 13.1 15

8 11.8 7.8 954 53.8 54.6 5818 12.8 17.6 16

28% 32% 37% RMSE

best estimate

Note higher EB correction for

short segments

23

Conclusions

• EB+1yr ≈ 4yrs of data• Better model did not necessarily improve

prediction (at least for the 10 intersections selected)

• Longer segment models are more accurate

• Intersection 4-year averages and models are relatively poor predictors– But when combined using EB, better

24

Thank you

[email protected]

mailto:[email protected]

Documents

1 Validation and Implication of Segmentation on Empirical Bayes for Highway Safety Studies Reginald R. Souleyrette, Robert P. Haas and T. H. Maze Iowa