A benchmark for automated roster generation algorithms

International J o u s t of

Industrial Ergonomics

ELSEVIER International Journal of Industrial Ergonomics 21 (1998) 243 247

A benchmark for automated roster generation algorithms

Geoff Harris a'*, Philip Bohle b

aSchoc P of lnJbrmation Systems and Management Science Griffith University. Nathan, Queensland 4111. Australia b Graduate School of Management, The Universi O, 0[' Queensland, Queensland 4072, Australia

Received 15 November 1995; accepted 12 November 1996

Abstract

This paper describes a benchmark that enables objective comparison between the implementations of algorithms for automated shift roster generation. The benchmark consists of three computational tests that provide measures of correctness, efficiency and efficacy. The tests are designed to ensure that it is difficult to specifically fine tune an implementation to perform well on the benchmark. A recently developed implementation (Bohle and Harris, 1996) is used to provide run-time performance metrics for a variety of PC hardware configurations.

Relevance to industry

Software that effectively automates shift roster design has the potential to greatly reduce the financial and health costs incurred by inefficient, manual roster design. The benchmark described here provides organisations with an objective basis for evaluating the effectiveness of different rostering packages, including in-house and commercial applications. i': 1998 Elsevier Science B.V.

Kevwords. Shift rosters; Software benchmarks

1. Introduction

Compute r i sed shift roster generat ion has recently a t t rac ted considerable scientific a t tent ion (see Bechtold and Jacobs, 1990; Franz and Miller, 1993; Thompson , 1990). This interest has arisen for at least two reasons. Firstly, at a practical level, rostering is a frequently recurring p rob lem with substantial health, safety and financial implicat ions for m a n y organisat ions. Secondly, at a scientific level, rostering is a part icularly intractable non-

* Corresponding author.

deterministic polynomial (NP) complete p rob lem for which there are no general mathemat ica l solutions (see Barthotdi, 1981). Identif ication of ef- fective strategies for solving specific rostering p rob lems is therefore an issue of considerable practical and scientific interest.

The major difficulty encountered in the development of rostering solutions lies in the size of the search space to be traversed (Bartholdi, 1981). Con- sequently, a lgor i thms that a t t empt to identify all possible solutions to a specific rostering problem, and locate the "op t imal" solution, are only likely to succeed in relatively trivial cases. This complexi ty has led to a variety of distinctly different approaches,

0169-8141/98/'$19.00 Copyright :~ 1998 Elsevier Science B.V. All rights reserved PII S0 1 69-81 4 1{97)00043-7

244 G. Harris, P. Bohle / International Journal ()['bldustrial Ergonomics 21 (1998) 243 247

ranging from "'pure" linear programming to "pure" logic-based artificial intelligence techniques, that have been applied with varying success to specific practical problems (Bohle and Harris, 1996). Des- pite the large number of algorithms that have been devised and implemented, there are currently no objective criteria for comparing them.

To establish the relative efficiencies of different algorithms it is important to develop a benchmark consisting of a broadly-based set of tests. The benchmark should not only facilitate comparison of basic algorithms and implementations, but also enable evaluation of heuristics used as supple- mentary decision mechanisms to improve the efficiency of the basic algorithms. The benchmark proposed in this paper is intended to achieve these aims and provide a starting point for systematic comparison and improvement of shift rostering software.

2. Objectives of the present benchmark

Most computer-based benchmarks measure hardware performance (see Berry, 19921. Nonethe- less, some recent work has used research-based metrics to evaluate computer algorithm implementations (Berghel and Rankin, 1990; Spring et al., 1992). This paper follows this trend by developing a set of benchmark tests to objectively compare algorithms derived from dissimilar approaches. The present benchmark is comprised of three computational tests that are intended to assess: • the correctness of the implementation of a gen-

eral solution algorithm, • the run-time efficiency of the algorithm in

traversing various search spaces, • the effectiveness of an algorithm in trimming the

shift roster search space. The first two tests, which involve counts of the

number of solutions produced for simple rostering problems, are primarily designed to assist researchers during algorithm and implementation development. The third test is much more complex and is intended for shift roster programs approach- ing final development. The correctness test deter- mines that an algorithm is capable of generating all possible shift rosters for given design requirements.

The effectiveness and run-time efficiency tests rely upon the CPU run time required to completely traverse the search spaces.

The goal of these tests is to provide a stable environment in which researchers and users can objectively test various implementations, algorithms, and heuristics applied to rostering problems. As stated above, the principal criteria for comparison are correctness and run-time efficiency.

Test 1 assesses whether all possible solutions to a very simple roster are generated. As the total number of correct solutions can be analytically determined, this test provides an exact demonstration of the correctness of the implementation of an algorithm. It is most valuable to researchers when testing new algorithms, "fine-tuning" final programs or incorporating additional heuristics into existing algorithms.

Test 2 specifies a rostering problem very similar to the one used in Test l, but in this case it can be analytically proven that no valid solutions exist. It provides a basis against which implementations can be timed to assess how long it takes them to determine that no solutions exist. Being analytic, this test also enables a further demonstration of the correctness of an implementation.

Test 3 is a slightly modified, genuine fostering problem, selected because it displays the following features: • the roster is highly constrained, • there are very few valid solutions, • the accuracy of the solutions produced can be

readily manually verified. The test is a practical problem which allows

comparisons based upon the implementation's ability to successfully prune the search space without discarding the very few solutions that exist. It also tests the capacity of the algorithm to manage a large number of practical constraints.

3. The proposed benchmark tests

3.1. Test I

3.1.1. Rationale This test is designed to ensure that the total

number of correct solutions to the rostering

G. Harris. P. Bohle / International Journal c?['Industrial Ergonomics 21 (1998) 243 247

Table 1 Worker requirements and availabilities for Test 1

245

Monday Tuesday Wednesday Thursday Friday

Workers required 2 2 3 3 2 1 Workers available 5 5 9 9 4 4

problem is known. This property is especially useful during the implementation of an algorithm. If an implementation cannot generate the correct number of solutions to this test, then either

1. the algorithm has not been correctly implemented, or

2. the algorithm is flawed. Logically, there is a very low probability that

a flawed algorithm could be incorrectly implemented, and these errors have interacted to result in the correct number of solutions being reported. Although exceedingly unlikely, this possibility illus- trates the need for a benchmark to consist of more than one test.

3.1.2. Test roster requirements

The roster to be generated in this test must sat- isfy the following criteria: • It is a five-day (Monday Friday) roster. There is

a single shift on each day, except for a split shift on Wednesday. The split shift is non-overlapping in time and there is no requirement that a worker is excluded from being assigned to both shifts on that day.

• There are 5 workers available to work any of the Monday, Tuesday or Wednesday shifts and a further 4 workers able to do any of the Wed- nesday, Thursday or Friday shifts. These requirements are summarised in Table 1.

3.1.3. Analysis and results

As noted above, it is possible to analytically derive the number of valid solutions to this roster. As each shift is independent of all other shifts, and the order in which workers are chosen is incon- sequential, the number of solutions can be determined by repeated application of the combinatorial formula, wC,. The total number of possible solu-

tions to this test roster is given by:

= 10x 10× 8 4 × 8 4 x 6 x 4

= 16 934 400 possible solutions.

To successfully complete this test, an implementation must generate and report all 16 934 400 possible solutions. It is unnecessary to manually verify all solutions; a simple check on the first few generated would normally suffice.

3.2. Test2

3.2.1. Rationale Unlike Test l, where there were many possible

solutions, this test has been designed to ensure that no solution exists. That is, this test represents an example of "'worst" case behaviour (Berghel and Rankin, 1990) of the implementation of an algorithm. This test has two objectives: (1) to correctly determine that no solutions exist, and (2) to time how long the implementation takes to reach this conclusion. As in the first test, it can be analytically determined that no solutions exist. If an implementation does find a solution, either the algorithm or the implementation is flawed.

3.2.2. Test roster requirements

The roster requirements used in Test 1 are again to be satisfied with one modification. A "supervisor" is to be rostered on with each shift. There are to be only 2 supervisors available for rostering. Furthermore, no one supervisor is to work 2 consecutive days or shifts.

3.2.3. Analysis and Results It is straightforward to prove that no solutions

exist to this particular test problem:

246 G. Harris, P. Bohle / International Journal of industrial Ergonomics 21 (1998) 243 ~ 247

Given that no one supervisor may work 2 consecutive shifts, it will be necessary to roster both supervisors on a Wednesday. However, if both supervisors are to be used on a Wednesday, it is impossible for either of them to work on Tuesday or Thursday because no supervisor can work 2 consecutive days.

The measure of performance is the time taken by the implementation to determine empirically that no solutions exist.

3.3. Test3

• crews must not work more than four consecutive day shifts without a day off,

• crews must not work more than three consecutive night shifts without day off,

• at least one crew must have four consecutive days off per fortnight,

• each crew must have one weekend off per fortnight,

• each block of night shifts must be preceded by at least 24 h off,

• each block of night shifts must be followed by at least 24 h off.

3.3.1. Rationale This test is based on a difficult "real world"

problem. The authors obtained the problem from a mining company that had tried, unsuccessfully, for two months to manually construct a roster for their maintenance crews. Manual solutions are very difficult to devise due to the highly constrained nature of the problem.

Although there is no analytic method available to determine the number of solutions that exist, our own implementation indicates that there are only two (Bohle and Harris, 1996). Given the success of this implementation on Tests 1 and 2, we tent- atively conclude that there are only two independent 1 solutions until this conclusion is disproved. The measure of performance on this test is the time an implementation takes to generate the two solutions and terminate successfully.

3.3.3. Analysis and results This problem is clearly very difficult to solve

manually. In fact, the extremely large number of possible combinations of crews and shifts (in the order of (4Clt32 = 1.Sx 1019) presents a major challenge to computerised roster generation. Effect- ive implementations must discard the majority of these combinations in order to find the very few valid solutions that exist. This process can only be successfully completed if an algorithm can fully utilise the constraints in generating solutions. Con- sequently, this test represents a very exacting test of an algorithm's effectiveness in generating solutions to highly constrained rosters.

A solution to this particular test is provided in Bohle and Harris (1996). However, due to the highly constrained nature of the problem, it is easy to manually verify the solutions generated.

3.3.2. Test roster requirements The problem requires solutions to a 2-week

rotating roster with overlapping shifts. The solutions must meet the following conditions: • each of four crews must work an average of 48 h

per week, • there must be a surplus crew on both day shift

and night shift each Wednesday, • there must be 24h coverage, seven days per

week, • all shifts must be 12 h in duration,

i Additional solutions can, however, be produced by per- mutat ing the allocation of crews to shift progressions.

3.4. Overall run-time metrics

Table 2 presents the run times achieved by a recent implementation (Bohle and Harris, 1996) on each of the three benchmark tests. To facilitate comparison, the run times have been performed on a variety of PC hardware platforms. However, due to physical differences in PC hardware compo- nents, it would be unwise to treat these values as optimised for the hardware configurations listed. Rather, the values presented should be considered to be indicative of the performance of the implementation used. A 20% reduction in run times should, however, be interpreted to demonstrate a superior implementation or algorithm.

G. Har~qs, P. Bohte / International Journal of lndust~qat Ergonomics 21 (1998) 243--247 247

Table 2 Times taken (in seconds} for the benchmark tests on various hardware platforms using the implementation described in Bohle and Harris (1996)

Hardware Test 1 Test 2 Test 3

6 MHz 386 1237.69 10.82 0.27 33 MHz 386 743.09 6.59 0.11 33 MHz 486 SX 433.75 3.79 0.05 33 MHz 486 288.09 2.53 0.11 DX2/40 MHz 486 358.55 3.13 0.05 DX2/66 MHz 486 218.61 1.86 0.05 DX4/100 MHz 486 127.27 1.04 0.05 66 MHz Pentium 124.62 1.21 0.00 75 MHz Pemium 110.29 1.04 0.00 90 MHz Pentium 92.72 0.88 0.00

4. Conclusions

This is the first b e n c h m a r k p r o p o s e d for rostering software. As a lgor i thms become more efficient at genera t ing shift schedules it will undoub ted ly be necessary for it to be expanded to include more complex tests. A new test might, for example , not only require sat isfact ion of a series of core roster design criteria, such as a m a x i m u m n u m b e r of consecut ive night shifts and a m i n i m u m per iod of t ime off after each b lock of night shifts, but also op t imisa t ion of the ros ter in terms of a less cri t ical set of b roade r e rgonomic criteria. Researchers and o ther users should be encouraged to p ropose ap- p rop r i a t e extensions to the existing set of tests. The present benchmark should, nevertheless, p rovide a r igorous initial test of the a lgor i thms and implemen ta t ions avai lable now and in the near future.

This paper descr ibes a set of b e n c h m a r k tests tha t can be used to evalua te the correctness , efficiency and effectiveness of sof tware designed for a u t o m a t e d shift ros ter genera t ion . The tests facilitate the c o m p a r i s o n of basic a lgor i thms and implemen ta t ions and the eva lua t ion of heurist ics in t roduced as supp lemen ta ry decis ion aids. Conse- quent ly, the benchmark should prove to be of con- s iderable prac t ica l value for researchers deve lop ing new a lgor i thms and for o rgan isa t ions tha t need a s t a n d a r d for c o m p a r i n g the sof tware p roduc ts ava i lab le to them,

Researchers and software developers should find the b e n c h m a r k tests useful for bo th a lgor i thm deve lopment and full-scale test ing of comple ted im- p lementa t ions . A lgor i thms and implemen ta t ions can tested for correctness , efficiency of search tree traversals , and run- t ime per formance across a variety of ha rdware platforms. The analy t ic results de te rmine the correctness of a lgor i thms and implemen ta t ions whilst the metr ics in Table 1 provide a basis for c o m p a r i n g effÉciency and run- t ime per- fo rmance agains t a recent implementa t ion .

References

Bartholdi, J.J., 1981. A guaranteed-accuracy round-off algorithm for cyclic scheduling and set covering. Operations Research 29, 501 510.

Bechtotd, S.J., Jacobs, kW., 1990. Implicit modelling of flexible break assignments in optimal shift scheduling. Management Science 36, 1139-1351,

Berghel, H., Rankin, R., 1990. A proposed standard for measur- ing crossword compilation efficiency. Computer Journal 32, 276-280.

Berry, R., 1992. Computer benchmark evaluation and design of experiments: a case study. IEEE Transactions on Computers 41, tl79 1289.

Bohle, P., Harris, G., 1996. Shift roster generation using the slot table formalism. GSM Business Papers Series. The Univer- sity of Queensland, St Lucia.

Franz, L.S., Miller, J.L, 1993. Scheduling medical residents to rotations: solving the large-scale multiperiod staff assign- ment problem. Operations Research 41,269 279.

Spring, L.J., Berghel, H., Harris, G,H., Forster, J.J.H., 1992. A proposed benchmark for testing implementations of crossword puzzle algorithms, In: proceedings of the 1992 ACM/SIGAPP Symposium on Applied Computing. ACM Press, New York, pp. 99-101.

Thompson, G.M., 1990. Shift scheduling in services when em- ployees have limited availability: an g. P. approach. Journal of the Operations Management 9, 352~ 370.

Documents

A benchmark for automated roster generation algorithms