17
Geurts, Wets, Brijs and Vanhoof 1 Identification and Ranking of Black Spots: Sensitivity Analysis Karolien Geurts Geert Wets * Tom Brijs Koen Vanhoof Limburg University Data Analysis & Modelling Group Faculty of Applied Economics University Campus 3590 Diepenbeek Belgium Fax: +32(0)11 26 87 00 Tel: +32(0)11 26 -- -- {87 57; 86 49; 86 53; 86 08} Email: {karolien.geurts; geert.wets; tom.brijs; koen.vanhoof} @luc.ac.be Total number of words: 7473 Submission date: July 10 th , 2003 * Corresponding author

Identification and Ranking of Black Spots: Sensitivity Analysis

Embed Size (px)

Citation preview

Geurts, Wets, Brijs and Vanhoof 1

Identification and Ranking of Black Spots: Sensitivity Analysis

Karolien Geurts Geert Wets* Tom Brijs Koen Vanhoof Limburg University Data Analysis & Modelling Group Faculty of Applied Economics University Campus 3590 Diepenbeek Belgium Fax: +32(0)11 26 87 00 Tel: +32(0)11 26 -- -- {87 57; 86 49; 86 53; 86 08} Email: {karolien.geurts; geert.wets; tom.brijs; koen.vanhoof} @luc.ac.be Total number of words: 7473 Submission date: July 10th, 2003

* Corresponding author

Geurts, Wets, Brijs and Vanhoof 2

ABSTRACT

In Flanders, approximately 1014 accident locations are currently considered as ‘dangerous’. These ‘dangerous’ accident sites or ‘black spots’ are selected by means of their historic accident data for the period 1997-1999. More specifically; a combination of weighting values, respectively 1 for each light injury, 3 for each serious injury and 5 for each deadly injury (1_3_5), is used to rank and select the most dangerous accident locations. In this paper a sensitivity analysis is performed to investigate the impact on the identification and ranking of black spots of 3 different weighting value combinations representing a different attitude towards the traffic safety problem: avoiding all accidents (1_1_1), all deadly accidents (1_1_10) and all accidents with serious or deadly injuries (1_10_10). Furthermore, effects of using the expected number of accidents, estimated from a hierarchical Bayesian model, instead of the historic count data to rank and select the accidents sites are evaluated. Results show that a different attitude towards the traffic safety problem and the choice of the corresponding injury weighting values on the one hand and using estimates in stead of count values on the other hand do have important consequences for the selection and ranking of black spots. Not only will this have an important impact on the number of accident locations that will receive a different ranking order, it will also have an important effect on the type of accident locations that are selected as ‘dangerous’ and accordingly on the resulting future traffic safety decisions.

Geurts, Wets, Brijs and Vanhoof 3

INTRODUCTION

In 2000, 49065 traffic accidents with casualties occurred on the Belgian public roads. In these accidents 1470 people were killed, while 9847 people were seriously injured and 58114 people were lightly injured. In that same year, the probability of having a deadly accident (relatively to the number of vehicle-kilometers traveled) was almost 28% higher in Belgium than the European average. Based on these figures, Belgium has a bad record towards traffic safety in comparison with most other European countries (1). Furthermore, the steady increase in traffic intensity does not only pose a heavy burden on the society in terms of the number of casualties, the insecurity on the roads will also have an important effect on the economic costs associated with traffic accidents (2). In Belgium, this macro-economical loss due to the lack of traffic safety on the roads is estimated at 3.72 billion Euros per year (3). Accordingly, traffic safety is currently one of the highest priorities of the Belgian government.

In this paper, we will focus on the identification of geographical locations with highly concentrated traffic accidents to provide valuable input for government actions towards traffic safety. Methods developed for identifying such accidents concentrations often apply to black spots, which are pinpoint concentrations of road accidents that often migrate over time (see e.g. (4), (5), (6), (7), (8), and (9)). More recently, identification of black zones has also been reconsidered in the literature (see (10) for a review); they arise from the awareness of the spatial interaction existing between contiguous accident locations. In general, we can state that there is no universally accepted definition of a black spot or a black zone. According to Hauer (8) some researchers rank locations by accident rate (accidents per vehicle-kilometers or per entering vehicles), some use accident frequency (accidents per km-year or accidents per year) and some use a combination of the two. Another dimension of diversity in practice is that rank may be determined by the magnitude (of either of rate or of frequency) or, as is more common, by the amount by which the rate or frequency exceed what is normal for such sites. According to The Bureau of Transport and Regional Economics of Australia (11) locations are in general classified as black spots after an assessment of the level of risk and the likelihood of a crash occurring at each location. At certain sites, the level of risk will be higher than the general level of risk in surrounding areas. Crashes will tend to be concentrated at these relatively high-risk locations. Locations that have an abnormally high number of crashes are described as crash concentrated, high hazard, hazardous, hot spot or black spot sites.

In Flanders (the Flemish speaking community of Belgium), approximately 1014 accident locations are currently considered as ‘dangerous’. These ‘dangerous’ accident sites or so-called black spots are selected by means of their historic accident data for the period 1997-1999. Based on these data, each site where in the last 3 years 3 or more accidents have occurred, is selected. Then, a location is considered to be dangerous when its priority value (P), calculated using the following formula, equals 15 or more (2):

P = X + 3*Y + 5*Z, where X = total number of light injuries

Y = total number of serious injuries Z = total number of deadly injuries

To improve the traffic safety on these locations, the Flemish government, will each year, starting in 2003 for a period of 5 years, invest 100 million EURO to redesign the infrastructure of the 800 black spots with the highest priority value.

However, the choice for the different parameters used in the priority value formula will greatly influence the ranking and selection of the most dangerous accident locations. Furthermore, accident counts are used to target locations with a number of accidents exceeding a chosen threshold as hot spots. This method is very sensitive to random variation in accident counts and to the regression to the mean problem (see e.g. (12), (13)). Therefore, in this research we will perform a sensitivity analysis to investigate the strengths and weaknesses of the currently used method to identify and rank black spots. More specifically, we will evaluate the effect of changing the 1_3_5 weighting values on the ranking and selection of the most dangerous accident locations. Not only will we evaluate the changes in the ranking orders for the 1014 accident locations that are currently considered as most dangerous, we will also investigate the effects on the ranking order of respectively all accident locations of Flanders for the years 1997-1999 and all accident locations where at least 3 accidents occurred between 1997-1999. Furthermore, we will evaluate the effects of using the expected number of accidents, estimated from a hierarchical Bayesian model, instead of the historic count data to rank and select the accidents sites.

The remainder of this paper is organized as follows. First a formal introduction to the hierarchical Bayesian model and to the statistical techniques used in this research is provided. This will be followed by a

Geurts, Wets, Brijs and Vanhoof 4

description of the dataset. Next the results of the empirical study are presented. The paper will be completed with a summary of the conclusions and directions for future research.

TECHNIQUES

Hierarchical Bayesian Model

A number of statistical models have been used to estimate accident rates and/or accident frequencies at a specific location over a given interval of time (see (14), (8), (15) and (16) for a review). The underlying assumption is that road accidents can be treated as random events with an underlying mean accidents rate for each accident location. To account for this probabilistic nature of accident occurrence compelling arguments can be found to support the assumption that accidents counts follow the Poisson probability law (17). For example, Saccomanno and Buyco (18) and Blower et al (19) have used a Poisson loglinear model to explain variations in accident rates. However, this model could be inappropriate for road accident counts, since it fails to account for extra-Poisson variation (the value of the variation could exceed the value of the mean) in the observed accidents counts. To solve this problem of extra-Poisson variation, several authors such as Persaud (20), Miaou (21), Shankar et al (22), Maher and Summersgill (23), Hauer (24), Abdel-Aty and Radwan (25) have used negative binomial regression models. Hauer and Persaud (14) introduced the Poisson-gamma generalized linear model which is now widely accepted (see e.g. (26) and (24)), allowing the Poisson mean to vary between locations.

More recently, Emprical Bayes methods are used in road safety to identify black spot locations arguing that adjusting historical data by statistical estimates yields improved predictability (see e.g. (13), (21), (28) and (28)). Furthermore, the use of ranking procedures based on a hierarchical Bayesian approach has been proposed in literature. These methods can handle the uncertainty and the great variability of accident data and produce a probabilistic ranking of the accident locations. Although the use of hierarchical Bayesian models in traffic safety is less widespread (see (17) for a review), the approach has been applied to ranking problems in various other application domains, like educational institutions or hospitals (see, e.g. (29)) as well as in traffic safety ((30)). Recently, Tunaru (31) proposed a hierarchical Bayesian approach for ranking accidents sites based on a bi-variate Poisson-lognormal distribution. This approach was extended by Brijs et al. (17) by considering a more realistic model for the accident behavior taking into account the number of accidents, the number of fatalities, and the number of light and severely injured casualties for a given time period for each site. This is done by using a 3- variate Poisson distribution which allows for covariance between the variables. The parameters of the model are estimated via Bayesian estimation facilitated by MCMC methods. In order to combine all data into a single number that will be used for ranking the sites, a cost function can be used that measures the ‘cost’ of an accident according to the number of fatalities, heavy an light injured casualties. A more detailed description of the technique can be found in Brijs et al. (17). Obviously, this technique allows to use several cost functions based on different weighting value combinations, allowing to investigate the effect of altering the 1_3_5 weighting values on the ranking and selection of the most dangerous accident locations.

Percentage Deviation Value

To quantify the effects of changing the ranking and selection criteria of black spots, we will use the percentage deviation value. This measure allows to compare the rankings of two datasets containing different elements. For example, when considering the 800 most dangerous accident locations, it is possible that changing the weighting values of the ranking criterion will cause new accident locations to come into the top 800 black spots while other locations are removed from this data set. Therefore, we will use the percentage deviation value to calculate the changes in the ranking order of the accident locations. As described in definition 1, the percentage deviation is calculated by dividing the number of accident locations that do not appear in both data sets by the total number of elements in one dataset. Definition 1 :: Percentage deviation (pr)

TG

−= 1 pr with G = Number of common elements in both datasets

T = Total number of elements in each dataset

When the percentage deviation value is small, the two ranked datasets will contain a great number of common elements. In a traffic safety context, this could for example mean that the current priority list of 800 black spots

Geurts, Wets, Brijs and Vanhoof 5

is rather stable, regardless of the used weighting values or the consideration of the number of passengers in the analysis. However, note that the percentage deviation only gives information about the number of accident locations that do not appear in both ranked datasets and does not take into account internal shifts in the ranking position of these common accident locations.

DATA

To allow for a sensitivity analysis on the currently used black spot criterion, this study is based on the same data used to select and rank the 1014 currently considered most dangerous accident locations. These data originate from a large data set of traffic accidents obtained from the National Institute of Statistics (NIS) for the region of Flanders (Belgium) for the period 1997-1999. More specifically, the data are obtained from the Belgian “Analysis Form for Traffic Accidents” that should be filled out by a police officer for each road accident that occurs on a public road (i.e. motorways, national and provincial roads linking towns) and that involves casualties, since the location of these accidents is accurately known by means of a hectometer stone marker. These traffic accident data contain a rich source of information on the different circumstances in which the accidents have occurred: course of the accident, traffic, environmental conditions, road conditions, human conditions and geographical conditions. As explained in the previous section, the accident data needed to perform this sensitivity analysis will be limited to the number of accidents per accident location. Furthermore, these data will only contain the number of fatalities and the number of light and serious casualties per accident location.

In total, 50961 traffic accidents with casualties are reported in this period. To account for crossroads and accordingly prevent double counts of accidents on multiple streets, the corresponding accident locations are corrected and are allocated one unique location identifier per accident. This results in 23184 unique accident locations included in the data set. On the one hand, we will focus on the 1014 most dangerous accident locations to explore the sensitivity of their ranking orders when using adapted weighting values and estimates in stead of count values. On the other hand, we will concentrate on all of the 23184 accident locations and all accident locations where at least 3 accidents occurred between 1997 and 1999 to evaluate the stability of the current list of black spots when using different selection criteria.

RESULTS

Evaluating The Ranking Order Of The 1.014 Currently Most Dangerous Accident Locations

Using Different Weighting Values

As explained in the introduction of this paper, the 1014 most dangerous accident locations are currently ranked and prioritized using respectively the values 1, 3, 5 as the different weighting values for a lightly (LI), seriously (SI) or deadly injured (DI) casualty of an accident.

Table 1 gives an overview of the effects of changing the weighting values on the ranking order of these 1014 accident locations. More specifically, table 1 shows the percentage deviation values for different subsets of the 1014 currently most dangerous accident locations using different combinations of weighting values.

< INSERT TABLE 1 HERE >

In particular, table 1 presents the combinations of weighting values that represent a specific attitude of policy makers towards traffic safety:

• 1_1_1: This combination of weighting values assumes that every casualty of a traffic accident is evenly important. Therefore, all accidents are evenly important and should be avoided, regardless of the severity of the injury.

• 1_1_10: Using these weighting values, attention will be focused on accidents with deadly injured casualties. Accidents with lightly or seriously injured casualties receive relatively small attention.

• 1_10_10: This last group of weighting value combinations discriminates between accidents with small injuries on the one hand and accidents with serious or deadly injuries on the other hand. It is assumed that a seriously injured person could just as easily have been killed in the accident and the other way around. Lightly injured persons, however, are assumed to be characteristic for less serious accidents and will be less taken into account when identifying black spots.

Geurts, Wets, Brijs and Vanhoof 6

• 1_3_5: These combinations of weighting values use a more moderate approach to stress the importance of deadly accidents. As the injury types are more serious, the accident is considered to be more important. Note that the current black spot selection criterion is based on this proportion of weighting values.

Results of table 1 show that when changing the 1_3_5 weighting value combinations, this causes the different location subsets to deviate from the original 1_3_5 location subsets from 12.5% to 23.5%. This means that in the most extreme case, 23.5% of the accident locations considered belonging to the 15% most dangerous accident locations of the 1014 dangerous accident locations using 1_3_5 weighting values do not appear in the top 15% when changing the weighting values to 1_1_1. On the other hand, when changing the weighting values to the combination 1_1_10, only 12.5% of the resulting most dangerous accident locations will differ from the currently 800 selected black spots. These results illustrate that a different attitude towards the traffic safety problem and the choice of the corresponding injury weighting values in combination with the size of the subset of the accident locations that is considered has important consequences for the selection and ranking of black spots and for traffic safety actions in general.

Furthermore, results of table 1 show that the percentage deviation values of the different weighting value combinations differ greatly amongst each other. The more the weighting values deviate from each other, the greater the variability of the black spots will be. For example, when comparing the top 15% (of the 1014 most dangerous accident locations) calculated using the weighting value combination 1_10_10 and the weighting value combination 1_1_1, table 3 shows that 43.8% of the accident locations in this subset will differ from each other. When comparing this result with the previous results, it can be seen that this percentage deviation value is more extreme than the values related to the 1_3_5 weighting combination (respectively 23.5% and 21.5% for the top 15%). This can be explained by the more ‘general’ character of the 1_3_5 weighting values, which does not differ as greatly from the other weighting value combinations. However, note that although both weighting value combinations practically have an equal impact on the number of accident locations that will differ in the top 15%, the actual accident locations that change in this subset will not be the same.

Finally, the percentage deviation values are on average smaller when more accident locations are involved in the analysis. A possible explanation could be that the accident locations with a higher ranking value, according to the 1_3_5 weighting values, are more sensitive to changes in the weighting values than the accident locations with smaller ranking values. However, we should take into account that the greater the top of accident locations that is considered for analysis (X), the more accident locations can obtain a different ranking order without falling out of the top X% most dangerous accident locations. As explained in one of the previous sections, the percentage deviation value gives no information about these internal changes in ranking orders.

Using Bayesian Estimation

Table 2 shows the results of the comparative analysis between the location rankings based on count values and the location rankings based on Bayesian estimates.

<INSERT TABLE 2 HERE>

More specifically, table 2 gives the number of accident locations that appear for different subsets of the data set as well in the ranking based on the count values as in the ranking based on the Bayesian estimates. Furthermore, these same results are translated into percentage deviation values to give an indication of the differences in raking orders the Bayesian method results in.

These results show that using Bayesian estimation techniques in stead of the historic count data while using the same weighting value combination can lead to the selection of 3.9% to 10.6% different accident locations. These results are shown on the diagonals of the table. When selecting the 800 most dangerous accident locations using the 1_3_5 weighting values this leads to the selection of 8.5% or 68 different accident locations. Translated into costs, this means that theoretically 8.5 million EURO of the 100 million EURO investment budget for redesign is allocated partly due to the random variation in accident counts.

Additionally, the results of table 2 allow to estimate the effect of a change in the parameter weights combined with the use of Bayesian estimation values in stead of count data. For example, from table 1 it is shown that when selecting the 15% most dangerous accident locations a change in the parameter values from 1_1_1 to 1_10_10 can lead to a selection of 43.8% different accident locations . When also using Bayesian estimation, table 2 shows that this can even result in the selection of 45.1% different accident locations when

Geurts, Wets, Brijs and Vanhoof 7

comparing the rankings of the 1_1_1 Bayesian estimations with the ranking of the 1_10_10 count values, and even 50.3% different accident locations when comparing the rankings of the 1_1_1 count data with the ranking of the 1_10_10 Bayesian estimations. Note that these percentage deviation values will be higher that the percentage deviation values reflected on the diagonals of the table, which naturally can be explained by the effect of changing the weighting value combinations.

Finally, note that conform to the results of table 1 the percentage deviation values will be smaller when more accident locations are selected.

Evaluating The Ranking Order Of All Accident Locations

Using Different Weighting Values

In this section we will look at the consequences of changing the injury weighting values on all 23184 accident locations of Flanders. This allows to explore the sensitivity of the ranking orders of all accident locations, including the locations that are currently not considered as being very dangerous.

Table 3 shows the percentage deviation values for the three combinations of weighting values representing different attitudes towards the traffic safety problem (see previous section) for different subsets of all accident locations.

< INSERT TABLE 3 HERE >

Results of this table show that when selecting the 800 most dangerous accident locations using the different combinations of weighting values, the ‘new’ black spots will contain between 21.6% and 24.6% different accident locations compared to the 800 most dangerous accident locations selected by the 1_3_5 values. These results show that the concept of black spots is rather relative and depends strongly on the chosen traffic safety policy. Note that these figures differ from the results in table 1 for the top 800. This can be explained by the fact that the top 800, which in both tables is selected by the 1_3_5 conditions and which is used to calculate the percentage deviation values with the other weighting values, differs in the two tables. In this last analysis, we do not select the locations with a minimum of 3 accidents in the last 3 years first. Nor does the priority value need to exceed 15.

Furthermore, when considering subsets of different sizes of all the accident locations and comparing the results with these of the 1_3_5 weighting values the percentage deviation values will vary from 0.5% to 24.6%. Similar to the results of the previous section, these percentage deviation values are on average smaller when more accident locations are involved in the analysis. For example, when using the 1_10_10 values to determine the subset of 70% most dangerous accident locations, only 0.5% of these accident locations will differ from the results of the 1_3_5 weighting combination. As explained in the previous section, this can be a result from the inclusion of more accident locations which allows more internal shifts without falling out of the subset. However, when comparing these results with the percentage deviation value for the 1_1_1 weighting values to identify the top 70% (9.8%), we can conclude that, although the size of the subset increases, still some significant changes can appear depending on the weighting values.

Next, comparing amongst each other the differences in the selection of the accident locations by the different weighting value combinations leads to some interesting results in accordance with the results of table 1. For example, when comparing the top 800 of all accident locations calculated using the weighting value combination 1_10_10 and the weighting value combination 1_1_1, table 3 shows that 44,6% of the accident locations in this subset differ from each other. When comparing this result with the results related to the 1_3_5 weighting values again it can be seen that this percentage deviation value is more extreme for the different weighting value combinations amongst each other. Additionally, although results from table 3 show that the impact of both weighting value combinations on the number of accident locations that will differ in the top 800does not vary strongly (respectively 21,6% and 24,5% for the 1_1_1 and 1_10_10 combination), the actual accident locations that change in this subset will not be the same.

Using Bayesian estimation In table 4 the results of the comparative analysis between the location rankings based on count values and the location rankings based on Bayesian estimates for all 23184 accident locations are presented.

<INSERT TABLE 4 HERE>

Geurts, Wets, Brijs and Vanhoof 8

Similar to the results presented in table 2, the values on the diagonals of this table represent the effect of using Bayesian estimation values in stead of historic count data while using the same weighting value combination to select different subsets of the 23184 accident locations.

These results show that this ranking procedure can lead to the selection of approximately 0.0% different accident locations for the 1_3_5 and 1_10_10 weighting values when selecting the top 40% most dangerous accident locations, up to 18,2% different accident locations for the 1_1_10 weighting value combination when selecting the 800 most dangerous accident locations. Again, these results show that selecting the 800 most dangerous accident locations based on historic accident data could cause 18% or 18 million EURO of the investment budget for redesigning these accident locations to be ‘randomly’ allocated.

Furthermore, results of table 4 show the effects of change in the parameter weights combined with the use of Bayes estimation in stead of count data. The maximum percentage deviation value of this table shows that this ranking procedure can lead to the selection of 46,0% different accident locations when comparing the 1_1_1 Bayesian ranking with the 1_10_10 counts ranking for the 800 most dangerous accident locations. This results is slightly higher than the percentage deviation value of table 3 for the same subset and the same weighting value combinations. As mentioned for the results of table 2, the percentage deviation values will be smaller when more accident location are considered in the subset.

Evaluating The Ranking Order Of All Accident Locations with min. 3 accidents between 1997-1999

Using Different Weighting Values

In this last section we will investigate the effects of changing the injury weighting values on all accident locations of Flanders where at least 3 accidents occurred between 1997 and 1999. Note that this is currently one of the conditions (besides the priority value) for an accident location to be considered as dangerous (see introduction). Therefore, analyzing the impact of the injury weighting values on the ranking orders of these 5326 accident locations allows a more realistic exploration of the sensitivity of the ranking and selecting of dangerous accident locations when using different weighting values.

Table 5 shows the percentage deviation values for different combinations of weighting values and for different subsets of the accident locations with at least 3 accidents, all related to the ranking order of the currently used 1_3_5 weighting values.

< INSERT TABLE 5 HERE >

These results show that depending on the chosen injury weighting values, selecting subsets of the most dangerous accident locations can deviate from the 1_3_5 location subsets from 6,9% to 22,1%. More specifically, when selecting the 800 most dangerous accident locations using the 1_10_10 weighting values, 22,1% of these accident locations will differ from the 1_3_5 selection. Note that for the top 800, these results strongly coincide with the results of table 3 although in this analysis the extra criterion of minimum 3 accidents per location was used. Furthermore, analogously to the results of table 1 and table 2, the percentage deviation values are on average smaller when more accident locations are involved in the analysis.

Furthermore, results show that the results of these different weighting values differ greatly amongst each other. For example, when selecting the 800 most dangerous accident locations, for both the 1_1_1 as for the 1_10_10 weighting value combinations approximately 20% of the selected locations will differ from the locations selected with the 1_3_5 injury weighting values. However, amongst each other 40% of the 800 selected accident locations will differ. Once more, these percentage deviation values will decline when more accidents are involved in the analysis.

Using Bayesian estimation

Table 6 presents the results of a comparative analysis between ranking based on historic count data on the one hand and Bayesian estimation on the other hand. This analysis is performed for the accident locations where at least 3 accidents occurred in the last 3 years.

<INSERT TABLE 6 HERE>

The results on the diagonals of table 6 show that using Bayes estimation in stead of historic count data to rank and select the accident locations can lead to a maximum percentage deviation value of 5.7% for the 40%

Geurts, Wets, Brijs and Vanhoof 9

most dangerous accident locations using the 1_1_1 weighting values. Note that compared to the results of table 2 and table 4 this maximum value is relatively low. However, this still means that almost 6% of the accident locations that are considered to belong to the top 40% most dangerous accident locations are not considered as dangerous when using Bayesian estimation and vice versa.

Furthermore, when results of table 6 show that a change in the parameter weights combined with the use of Bayes estimation can lead to even higher percentage deviation values. For example, from table 5 it is shown that when selecting the 40% most dangerous accident locations a change in the parameter values from 1_1_1 to 1_10_10 can lead to a selection of 26.1% different accident locations . When also using Bayes estimation, table 6 shows that this can result in the selection of 27.1% different accident locations when comparing the rankings of the 1_1_1 Bayesian estimations with the ranking of the 1_10_10 count values, and even 30.2% different accident locations when comparing the rankings of the 1_1_1 count data with the ranking of the 1_10_10 Bayesian estimations. Analogously to the previous results, the percentage deviation values will decrease when more accident locations are selected in the subset.

CONCLUSIONS AND FURTHER RESEARCH

In this paper a sensitivity analysis is performed to investigate the strengths and weaknesses of the currently used method to identify and rank black spots. This analysis shows that a change in the traffic safety policy and the reflection of this choice in the injury weighting values used to identify and rank the most dangerous accident locations will not only have an important impact on the number of accident locations that will change when selecting and ranking black spots, it will also have an important effect on the type of accident locations (e.g. locations with high traffic volumes resulting in many small accidents) that are selected and accordingly on the resulting future traffic safety decisions. Government should therefore carefully decide which priorities should be stressed in the traffic safety policy. Next, the according weighting value combination can be chosen to rank and select the most dangerous accident locations. Furthermore, the use of Bayesian estimation values in stead of historic count data to rank the accident locations can overcome the problem of random variation in accident counts and will also have an important effect on the selection of the most dangerous accident locations. Accordingly, the use of Bayesian estimation values could prevent the government from investing a considerable amount of money in less dangerous accident locations while other dangerous locations are not even considered for redesign.

In future research, we will not only investigate the effect of changing the weighting values, we will also analyze the influence of the number of passengers on the ranking of the accident locations. Finally, it is important that in future research we try to take into account the problem of underregistration.

ACKNOWLEDGEMENT

Work on this subject has been supported by grant given by the Flemish Government to the Flemish Research Center for Traffic Safety. The authors would also like to thank Prof. Dr. D. Karlis for his encouragement and helpful suggestions in implementing the statistical model.

REFERENCES

1. Belgian Institute for Traffic Safety (BIVV) and National Institute for Statistics (NIS). Jaarrapport Verkeersveiligheid 2000 (CD-ROM), BIVV v.z.w., Brussels. (in Dutch)

2. Ministry of Flemish Community. Ontwerp-Mobiliteitsplan Vlaanderen, Brussels, Belgium, http://viwc.lin.vlaanderen.be/mobiliteit, 2001. Accessed July 2003. (in Dutch).

3. Dielemann L. Huidige ontwikkelingen van het verkeersveiligheidbeleid, Belgian Institute for Traffic Safety (BIVV), Brussels, Belgium, 2000. (in Dutch).

4. Silcock, D.T. and A.W. Smyth. Methods of Identifying Accidents Blackspots. Transport Operations Research Group, Department of Civil Engineering, University Of Newcastle Upon Tyne, 1985.

5. Maher M. A Bivariate Negative Binomial Model to Explain Traffic Accident Migration. In Accident Analysis and Prevention 22 (4), 1996, pp. 487-498.

6. Nguyen, T.N. Identification of Accident Blackspot Locations, An Overview. VIC Roads/Safety Division, Research and Development Department, Australia, 1991.

Geurts, Wets, Brijs and Vanhoof 10

7. Joly, M.-F., Bourbeau, R., Bergeron, J. and S. Messier. Analytical approach to the identification of hazardous road locations: a review of the literature. Centre de recherche sur les transports, Université de Montréal, 1992.

8. Hauer, E.. Identification of sites with promise. In Transportation Research Record 1542. 75th Annual Meeting, Washington DC, 1996, pp. 54–60.

9. Thomas, I. Spatial data aggregation: Exploratory analysis of road accidents. In Accident Analysis and Prevention 28, 1996, pp. 251–264.

10. Flahaut, B., Mouchart, M., San Martin, E. and I. Thomas. The local spatial autocorrelation and the kernel method for identifying black zones. A comparative approach. In Accident Analysis and Prevention. In press, 2003.

11. The Bureau of Transport and Regional Economics of Australia. The Black Spot Program 1996-2002: An evaluation of the first three years, 2001.

12. Hauer, E. On the estimation of the expected number of accidents. In Accident Analysis and Prevention 18 (1), 1986, pp. 1-12.

13. Elvik, R. Evaluations of road accident black spot treatment: a case of the iron law of evaluation studies? In Accident Analysis and Prevention 29 (2), 1997, pp. 191-199.

14. Hauer, E. and B.N. Persaud. How to estimate the safety of rail-highway grade crossing and the effects of warning devices. In Transportation Research Record 1114, 1987, pp. 131-140.

15. Nassar, S. Integrated Road Accident Risk Model, Phd. Thesis, Waterloo, Ontario, Canada, 1996. 16. Geurts, K. and G. Wets Black Spot Analysis Methods: Literature Review. Flemish Research Center for

Traffic Safety, Diepenbeek, Belgium, 2003. 17. Brijs T., Karlis D., Van den Bossche F. and G. Wets. A Bayesian model for ranking hazardous sites.

Flemish Research Center for Traffic Safety, Diepenbeek, Belgium, 2003. 18. Saccomanno, F. and Buyco, C. Generalized loglinear models of truck accident rates. In Transportation

Research Record 1172, 1988, pp. 23-31. 19. Blower, D., Campbell, K., and P. Green, Accident rates for heavy truck-tractors in Michigan. In Accident

Analysis and Prevention 25 (3), 1993, pp. 307-321. 20. Persaud, B. Black spot identification and treatment evaluation. The Research and Development Brach,

Ontario, Ministry of Transportation, 1990. 21. Miaou, S.P. The relationship between truck accidents and geometric design of road sections: Poisson versus

negative binomial regressions. In Accident Analysis and Prevention 26 (4), 1994, pp.471- 482. 22. Shankar, V., Mannering, F., and W. Barfield. Effect of roadway and environmental factors on rural freeway

accident frequencies. In Accident Analysis and Prevention 27 (3), 1995, pp. 371-389. 23. Maher, M.J. and I. Summersgill. A comprehensive methodology for the fitting of predictive accident

models. In Accident Analysis and Prevention 22 (5), 1996, pp.487-498. 24. Hauer, E. Observational before-after studies in road safety. Pergamon, Oxford, 1997. 25. Abdel-Aty, M.A. and A.E. Radwan. Modelling traffic accident occurrence and involvement. In Accident

Analysis and Prevention 32 (5), 2000, pp. 633-642. 26. Kulmala, R. Safety at rural three- and four-arm junctions. Technical Research Centre of Finland (VTT),

Espoo, Finland, 1995. 27. Belanger, C. Estimation of Safety of Four-legged Unsignalized Intersections. In Transportation Research

Record 1467, 1994, pp. 23-29. 28. Vogelesang, A.W. Bayesian Methods in Road Safety Research: an Overview. Institute for Road Safety

Research (SWOV), Leidschendam, The Netherlands, 1996. 29. Goldstein H. and D.J. Spiegelhalter. League tables and their limitations: Statistical Issues in comparisons of

institutional performance (with discussion). In Journal of the Royal Statistical Society A 159, 1996, pp.385-443.

30. Schlüter, P.J., Deely, J.J. and A.J.Nicholson. Ranking and selecting motor vehicle accident sites by using a hierarchical Bayesian model. In The Statistician 46, 1997, pp.293-316.

31. Tunaru, R. Hierarchical Bayesian Models for Multiple Count Data. In Austrian Journal of Statistics 31, 2002, pp.221-229.

Geurts, Wets, Brijs and Vanhoof 11

LIST OF TABLES AND FIGURES

TABLE 1 Percentage deviation values for different subsets of the 1014 currently most dangerous accident locations using different combinations of weighting values. TABLE 2 Comparative analysis between counts and Bayes estimation for the 1014 currently most dangerous accident locations TABLE 3 Percentage deviation values for different subsets of all accident locations using different combinations of weighting values TABLE 4 Comparative analysis between counts and Bayesian estimation for all accident locations TABLE 5 Percentage deviation values for different subsets of all accident locations with min. 3 accidents using different combinations of weighting values TABLE 6 Comparative analysis between counts and Bayes estimation for top 15% (800) of all accident locations with min. 3accidents

Geurts, Wets, Brijs and Vanhoof 12

TABLE 1 Percentage deviation values for different subsets of the 1014 currently most dangerous accident locations using different combinations of weighting values

X (top X) Combination (LI_SI_DI) Combination (LI_SI_DI) 1_1_1 1_1_10 1_3_5 1_10_10

1_1_1 15% 0% --- --- --- 1_1_1 40% 0% --- --- --- 1_1_1 70% 0% --- --- --- 1_1_1 800 0% --- --- --- 1_1_10 15% 30.0% 0% --- --- 1_1_10 40% 26.1% 0% --- --- 1_1_10 70% 16.7% 0% --- --- 1_1_10 800 11.8% 0% --- --- 1_3_5 15% 23.5% 21.5% 0% --- 1_3_5 40% 18.4% 21.9% 0% --- 1_3_5 70% 16.4% 15.2% 0% --- 1_3_5 800 13.1% 12.5% 0% ---

1_10_10 15% 43.8% 40.5% 22.8% 0% 1_10_10 40% 39.4% 39.6% 22.1% 0% 1_10_10 70% 33.1% 26.2% 17.3% 0% 1_10_10 800 23.4% 23.7% 14.1% 0%

Geurts, Wets, Brijs and Vanhoof 13

TABLE 2 Comparative analysis between counts and Bayes estimation for the 1014 currently most dangerous accident locations Number of matches for top 15% Bayes 1_1_1 1_1_10 1_3_5 1_10_10 1_1_1 142 108 111 76

Counts 1_1_10 105 144 116 84 1_3_5 115 118 144 108 1_10_10 84 90 122 141

Percentage deviation values 1_1_1 1_1_10 1_3_5 1_10_10 1_1_1 7.3% 29.4% 27.4% 50.3% 1_1_10 31.4% 5.9% 24.2% 45.1% 1_3_5 24.8% 22.9% 5.9% 29.4% 1_10_10 45.1% 41.2% 20.3% 7.8%

Number of matches for top 40% Bayes 1_1_1 1_1_10 1_3_5 1_10_10 1_1_1 382 293 301 211 1_1_10 296 382 305 225 1_3_5 328 309 363 280 1_10_10 244 245 335 363

Percentage deviation values 1_1_1 1_1_10 1_3_5 1_10_10 1_1_1 5.9% 27.8% 25.9% 48.0% 1_1_10 27.1% 5.9% 24.9% 44.6% 1_3_5 19.2% 23.9% 10.6% 31.0% 1_10_10 39.9% 39.6% 17.5% 10.6%

Number of matches for top 70% Bayes 1_1_1 1_1_10 1_3_5 1_10_10 1_1_1 671 582 541 463

Counts 1_1_10 591 674 574 510 1_3_5 598 602 637 571 1_10_10 488 529 619 681

Percentage deviation values 1_1_1 1_1_10 1_3_5 1_10_10 1_1_1 5.5% 18.0% 23.8% 34.8%

1_1_10 16.8% 5.1% 19.1% 28.2% 1_3_5 15.8% 15.2% 10.3% 19.6% 1_10_10 31.3% 25.5% 12.8% 4.1%

Number of matches for top 800 Bayes 1_1_1 1_1_10 1_3_5 1_10_10 1_1_1 759 696 647 594

Counts 1_1_10 698 757 676 610 1_3_5 694 709 732 675 1_10_10 615 628 714 769

Percentage deviation values 1_1_1 1_1_10 1_3_5 1_10_10 1_1_1 5.12% 13.0% 23.6% 25.7%

1_1_10 12.7% 5.4% 15.5% 23.7% 1_3_5 13.2% 11.4% 8.5% 15.6% 1_10_10 23.1% 21.5% 10.7% 3.9%

Geurts, Wets, Brijs and Vanhoof 14

TABLE 3 Percentage deviation values for different subsets of all accident locations using different combinations of weighting values

X(top X) Combination (LI_SI_DI) Combination (LI_SI_DI) 1_1_1 1_1_10 1_3_5 1_10_10

1_1_1 800 0% --- --- --- 1_1_1 15% 0% --- --- --- 1_1_1 40% 0% --- --- --- 1_1_1 70% 0% --- --- --- 1_1_10 800 34.0% 0% --- --- 1_1_10 15% 25.6% 0% --- --- 1_1_10 40% 5.9% 0% --- --- 1_1_10 70% 3.1% 0% --- --- 1_3_5 800 21.6% 24.6% 0% --- 1_3_5 15% 22.8% 22.3% 0% --- 1_3_5 40% 19.9% 15.8% 0% --- 1_3_5 70% 9.8% 8.7% 0% ---

1_10_10 800 44.6% 44.3% 24.5% 0% 1_10_10 15% 38.5% 36.4% 17.1% 0% 1_10_10 40% 39.4% 35.4% 19.5% 0% 1_10_10 70% 10.1% 8.9% 0.5% 0%

Geurts, Wets, Brijs and Vanhoof 15

TABLE 4 Comparative analysis between counts and Bayesian estimation for all accident locations Number of matches for top 800 Bayes 1_1_1 1_1_10 1_3_5 1_10_10 1_1_1 763 674 676 509

Counts 1_1_10 522 654 584 500 1_3_5 618 653 735 661 1_10_10 432 470 552 731

Percentage deviation values 1_1_1 1_1_10 1_3_5 1_10_10 1_1_1 4,6% 15,7% 15,5% 36,3% 1_1_10 34,7% 18,2% 27,0% 37,5% 1_3_5 22,7% 18,3% 8,12% 17,3% 1_10_10 46,0% 41,2% 31,0% 8,6%

Number of matches for top 15% Bayes 1_1_1 1_1_10 1_3_5 1_10_10 1_1_1 3302 2885 2874 2191 1_1_10 2575 3163 2745 2239 1_3_5 2660 2823 3239 2909 1_10_10 2117 2292 2688 3361

Percentage deviation values 1_1_1 1_1_10 1_3_5 1_10_10 1_1_1 5,1% 17,1% 17,4% 37,0% 1_1_10 25,9% 9,1% 21,1% 35,6% 1_3_5 23,5% 18,8% 6,8% 16,4% 1_10_10 39,1% 34,1% 22,7% 3,4%

Number of matches for top 40% Bayes 1_1_1 1_1_10 1_3_5 1_10_10 1_1_1 9116 8726 7462 5622

Counts 1_1_10 8722 8929 7829 5995 1_3_5 7426 7807 9233 7468 1_10_10 5619 6001 7432 9177

Percentage deviation values 1_1_1 1_1_10 1_3_5 1_10_10 1_1_1 1,7% 5,9% 19,5% 39,4%

1_1_10 5,9% 3,7% 15,6% 35,3% 1_3_5 19,9% 15,8% ~0,0% 19,5% 1_10_10 39,4% 35,3% 19,9% ~0,0%

Number of matches for top 70% Bayes 1_1_1 1_1_10 1_3_5 1_10_10 1_1_1 14573 14534 14578 14566

Counts 1_1_10 14574 14717 14749 14738 1_3_5 14556 14729 16139 16139 1_10_10 14557 14723 16139 16139

Percentage deviation values 1_1_1 1_1_10 1_3_5 1_10_10 1_1_1 10,2% 10,4% 10,1% 10,2%

1_1_10 10,2% 9,3% 9,1% 9,2% 1_3_5 10,3% 9,2% 0,1% 0,1% 1_10_10 10,3% 9,3% 0,1% 0,1%

Geurts, Wets, Brijs and Vanhoof 16

TABLE 5 Percentage deviation values for different subsets of all accident locations with min. 3 accidents using different combinations of weighting values

X(top X) Combination (LI_SI_DI) Combination (LI_SI_DI) 1_1_1 1_1_10 1_3_5 1_10_10

1_1_1 800 (15%)° 0% --- --- --- 1_1_1 40% 0% --- --- --- 1_1_1 70% 0% --- --- --- 1_1_10 800 (15%) 27.5% 0% --- --- 1_1_10 40% 9.0% 0% --- --- 1_1_10 70% 1.8% 0% --- --- 1_3_5 800 (15%) 19.2% 20.9% 0% --- 1_3_5 40% 16.4% 12.8% 0% --- 1_3_5 70% 10.9% 10.2% 0% ---

1_10_10 800 (15%) 40.0% 34.6% 22.1% 0% 1_10_10 40% 26.1% 24.5% 12.2% 0% 1_10_10 70% 17.9% 17.1% 6.9% 0%

Geurts, Wets, Brijs and Vanhoof 17

TABLE 6 Comparative analysis between counts and Bayes estimation for all accident locations with min. 3accidents Number of matches for top 15% ( 800) Bayes 1_1_1 1_1_10 1_3_5 1_10_10 1_1_1 767 623 623 450

Counts 1_1_10 584 759 624 491 1_3_5 653 654 764 593 1_10_10 486 528 638 758

Percentage deviation values 1_1_1 1_1_10 1_3_5 1_10_10 1_1_1 4.1% 22.1% 22.1% 43.7%

Counts 1_1_10 27.0% 5.1% 22.0% 38.6% 1_3_5 18.4% 18.2% 4.5% 25.9% 1_10_10 39.2% 34.0% 20.2% 5.2%

Number of matches for top 40% Bayes 1_1_1 1_1_10 1_3_5 1_10_10 1_1_1 2009 1886 1720 1487

Counts 1_1_10 1924 2032 1809 1557 1_3_5 1768 1825 2015 1809 1_10_10 1553 1584 1871 2025

Percentage deviation values 1_1_1 1_1_10 1_3_5 1_10_10 1_1_1 5.7% 11.5% 19.3% 30.2%

Counts 1_1_10 9.7% 4.6% 15.1% 26.9% 1_3_5 17.0% 14.3% 5.4% 15.1% 1_10_10 27.1% 25.7% 12.2% 4.9%

Number of matches for top 70% Bayes 1_1_1 1_1_10 1_3_5 1_10_10 1_1_1 3716 3661 3261 3063

Counts 1_1_10 3658 3672 3286 3091 1_3_5 3315 3350 3620 3455 1_10_10 3058 3093 3526 3635

Percentage deviation values 1_1_1 1_1_10 1_3_5 1_10_10 1_1_1 0.3% 1.8% 12.5% 17.8%

Counts 1_1_10 1.9% 1.5% 11.9% 17.1% 1_3_5 11.1% 10.2% 2.9% 7.9% 1_10_10 17.9% 17.0% 5.7% 2.5%