Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Patrolling Games on General Graphs with1
Time-Dependent Node Values*2
Abdolmajid Yolmeh1 and Melike Baykal-Gursoy*�13
1Industrial and Systems Engineering Department, Rutgers University, 964
Frelinghuysen Rd, Piscataway, NJ 088545
Abstract6
Scheduling and deployment of patrols is an important operational decision in safe-7
guarding an area against adversarial invasion or illicit activity. Patrolling game models8
are useful when dealing with strategic adversaries. Most models of patrolling games on9
graphs assume that all nodes have the same value or that their values are fixed through-10
out time. In addition, the time needed to initiate a successful attack on a node, called11
the attack time, may not depend on the node. However, in reality, the nodes may have12
different values and these values may change over time. In this paper, we consider a13
general patrolling game model on a general graph with time-dependent node values and14
node-specific attack times. The model allows for multiple attackers and patrollers con-15
trolled by a single adversary and a single defender, respectively. This assumption leads to16
exponentially large number of possible pure strategies for both the defender and the ad-17
versary. In order to tackle this problem, we propose a solution approach that depends on18
both column and row generation. We then run numerical experiments to investigate the19
efficiency of the proposed approach and to gain insight about the value of the patrolling20
game. Finally, we apply our method to a real case of an urban rail network in a major21
US city. Our results show the efficiency of the proposed solution approach. They also22
demonstrate a diminishing returns effect when increasing the number of patrollers.23
*This material is based upon work supported by the National Science Foundation under Grant No.1436288�Email address: [email protected]; Corresponding author
1
Keywords: patrolling games, time-dependent node values, multiple patrollers, multiple1
attackers, zero-sum games2
1 INTRODUCTION3
One of the most important issues in homeland security is protecting critical infrastructures4
against terrorist attacks (Moteff, 2005). Among these infrastructures, transportation systems,5
serving 32 million passengers daily in the United States, are critical for supporting the national6
security and economic well-being. Public surface transportation systems such as trains, metros,7
subways and buses offer terrorists easy access to crowds of people. This makes them especially8
attractive to terrorists seeking high body counts. Such open systems are considered to be9
soft targets by the terrorists. Bombings in Brussels and Istanbul along with many other cases10
indicate that terrorists tend to target such large crowds to cause mass human casualties in11
addition to panic and chaos. Therefore, it is important to protect such infrastructures.12
Analyzing the risk associated with attack to each infrastructure component, mitigation13
planning and designing efficient response policies could substantially reduce the threat to these14
infrastructures. One component of such planning is designing efficient patrols to secure vul-15
nerable areas. One of the challenges involved in designing patrol schedules to safeguard open16
mass transit systems and other soft targets is time-dependent node values. Because the adver-17
sary’s primary objective is to inflict human casualties, the node values depend on the number18
of people residing in those nodes. These numbers change over time and the terrorists tend to19
time their attacks according to these changes (Jenkins and Trella, 2012). Another challenge is20
to develop efficient methods to design patrols for a general network. In this paper, we try to21
address these challenges in a patrolling game setting.22
Patrolling problems arise in many situations in real life. Police officers patrol cities; security23
officers patrol terminals at airports and transportation centers; security guards patrol museums24
and shopping malls. Patrolling problems involve decisions on how to route a patroller through25
many locations in order to safeguard the area from adversarial intruders or illicit activity.26
With recent advancements in technology, patrolling decisions arise even more frequently with27
applications in routing unmanned aerial vehicles and robots.28
The patrolling problems have been studied since 1970s. Several studies have focused on29
2
allocating patrols to different areas to optimize performance measures such as patrol delays,1
average waiting time and total response time (Chaiken and Dormont, 1978; Chelst, 1978; Lar-2
son, 1972; Olson and Wright, 1975). These studies assume that crime frequency in different3
regions remain fixed and known to the patroller. However, this is not a realistic assumption4
due to the strategic behaviour of the adversaries. In other words, the adversaries can change5
their strategy in response the patroller’s strategy. Therefore, game theoretic analysis of such6
problems yields more realistic results.7
Basilico et al. (2012) introduce a two-player multi-stage security game with an underlying8
infinite horizon setting in which there are potentially infinitely many decision nodes. In this9
model, the attacker has the complete knowledge of the strategy to which the patroller com-10
mitted to. The attacker can also observe the location and movements of the patroller at any11
time and chooses his best attack strategy based on this information. They study Markovian12
strategies of different orders for this problem and show that, even though first order Markovian13
strategies may not always be optimal, they have comparable quality with respect to higher14
order Markovian strategies. Basilico et al. (2017) consider a similar patrolling game model15
with the patroller employing spatially uncertain alarm signals. They prove that this problem is16
NP-hard for a general graph, they also show that for special graphs, like paths or cycle graphs,17
the optimal strategy can be found in polynomial time. Infinite horizon nature of the games18
studied in Basilico et al. (2012) and Basilico et al. (2017) leads to the application of stationary19
Markovian strategies by the patroller. This means that the timing of attacks becomes irrelevant20
in such games. However, this may not be valid in realistic situations, for example, when node21
values change over time.22
Alpern et al. (2011) introduce a finite horizon patrolling game played on a graph, Q, with n23
nodes. The game has two players, a defender patrolling a set of nodes on Q, and an adversary24
targeting a node to attack. The adversary needs m consecutive periods, uninterrupted by the25
defender, to successfully damage the node. He aims to maximize the probability of a successful26
attack, while the defender tries to minimize this probability. Hence, the proposed model is a27
zero-sum game and the solution to this game is called a saddle point (Fudenberg and Tirole,28
1991). Papadaki et al. (2016) study the same patrolling problem on a line graph. They solve29
this patrolling game for any values of m and n, to find a saddle point.30
3
Both Alpern et al. (2011) and Papadaki et al. (2016) assume that all nodes have the same1
value, they also assume that the attack time, m, is fixed and does not depend on the node2
under attack. However, as we have discussed earlier in this section, these assumptions may3
not be valid in reality. Especially in public transportation systems and other soft targets when4
node values represent the number of people present, called occupancy level, different nodes may5
have different values and these values may change over time. Morevover, some nodes may be6
harder to attack than others, therefore, it may take more time to launch a successful attack.7
Lin et al. (2013) attempt to address this gap by studying patrolling models with different8
node values and attack time distributions. They consider both random and strategic attackers.9
A random attacker uses a fixed and known probability distribution to launch attacks on nodes;10
while a strategic attacker plays a zero-sum game with the patroller. The authors develop11
linear programming models to find the optimal solutions for both players. They also propose12
index-based heuristics to solve the problems of larger size. Lin et al. (2014) extend the model13
of Lin et al. (2013) by allowing imperfect detection. In other words, there is a possibility of14
observing a false negative when the patroller inspects a node. They introduce efficient index15
based heuristics to obtain near optimal policies in a reasonable amount of time. Although Lin16
et al. (2013, 2014) resolve some of the shortcomings of the previous models, their models still17
do not consider time-dependent node values and the importance of the timing of attacks.18
There are a number of studies accommodating multiple patrollers in their models. Jain19
et al. (2010) study Stackelberg security games with arbitrary schedules and multiple patrollers.20
They develop a branch and price algorithm to efficiently solve this game. Their algorithm21
involves a column generation step that exploits a novel network flow representation avoiding22
the combinatorial explosion of schedule assignments. Korzhyk et al. (2011) investigate the23
case with multiple defenders and multiple attackers where the attacker can attack multiple24
targets. They propose a polynomial time algorithm to find the Nash equilibrium for this game.25
Hochbaum et al. (2014) consider a patrolling problem with multiple patrollers (vehicles) on26
a network with edges targeted by a strategic adversary. They present a novel decomposition27
approach that requires the solution of a multivehicle rural Chinese postman problem (Guan,28
1962). Lou et al. (2017) model a security game with multiple decentralized defenders in charge of29
defending disjoint subsets of, possibly interdependent, targets. They analyze the existence of a30
4
Nash equilibrium for this game under various conditions. Lagos et al. (2017) study a Stackelberg1
security problem with multiple patrols and multiple targets. They propose a branch and price2
approach to efficiently solve this problem. McGrath and Lin (2017) investigate a patrol problem3
with multiple patrollers and dispersed heterogeneous attack locations. Their model accounts for4
the travel time between nodes, and includes node-specific features such as the inspection time,5
the time required for the adversary to carry out an attack, and the cost of a successful attack.6
They show that, for the case of a single patroller, the optimal solution can be obtained by7
solving a linear program. For the case with multiple patrollers, they propose heuristic solutions8
based on shortest paths and set partitions.9
Majority of the papers in the literature of patrolling games assume that a single adversary10
chooses a target to attack, the target values are fixed over time, and some even assume that11
all targets are indistinguishable, i.e., they all have the same value. However, this is not the12
case in many realistic situations. For example, at a transportation facility, the number of13
people, occupancy level, at each location may be considered as the value of that location.14
Moreover, occupancy levels may change over time, it is expected that during the rush hours15
the occupancy levels would be higher than normal hours. In this paper, we study a patrolling16
game model with time-dependent node values, node-specific attack times, multiple patrollers17
and multiple attackers. We propose a solution approach to efficiently solve the game under18
general graphs. The computational results show the efficiency of the proposed approach. The19
rest of this paper is organized as follows. In section 2, the problem under consideration is20
described. The proposed solution approach is explained in section 3. Section 4 presents the21
computational results. Finally, the main conclusion of the paper and suggestions for future22
research are presented in section 5.23
2 PROBLEM DESCRIPTION24
In this section, we describe the problem under consideration. Our work extends the model25
of Alpern et al. (2011) by considering different and time-dependent node values, node-specific26
attack times, multiple patrollers and multiple attackers. Here is a list of parameters of the27
model:28
� N : Number of nodes.29
5
� N = {1, 2, . . . , N} : Set of nodes.1
� i, j ∈ N : Node indices.2
� d ∈ D : Index of patrollers.3
� a ∈ A : Index of attackers.4
� T : Number of patrolling time periods.5
� T = {0, 1, . . . , T − 1} : Set of time periods in the time horizon.6
� t, τ ∈ T : Time period indices.7
� mi : Attack time, consecutive number of periods needed to attack node i. Let m =8
(m1,m2, . . . ,mN).9
� E : Set of edges, where (i, j) ∈ E if there is an edge between nodes i and j.10
� cit : Value of node i at time t. Let c = [cit] be a N ×T matrix containing all of cit values,11
with element in row i and column t being cit.12
The patrolling game G = G(Q, T,m, c) introduced in this paper is a zero-sum game between13
a defender (she) and an adversary (he). The defender controls a set of patrollers D. The14
adversary controls a set of attackers A. The game is played on a connected graph Q = (N , E)15
with the set of nodes N and the set of edges E over the time horizon T .16
A pure strategy for the adversary is to select a pair (ia, Ia) for each attacker, a ∈ A, where17
ia ∈ N is the target node and Ia is the attack interval defining the beginning, τa, and the end of18
the attack, which is a set of mj consecutive time periods, i.e., Ia = {τa, τa+1, . . . , τa+mj−1} ∈19
T , where j = ia. We can also represent an attack strategy as (ia, τa). Note that, for each attacker20
a, the start of attack interval, τa, should be early enough for the attack interval to finish before21
the end of the time horizon T , i.e., τ ≤ T −mj, where j = ia. We assume that the adversary22
cannot assign an attack pair to more than one attacker.23
We define a patrol as a walk P : T → Q on graph Q during the time horizon T . A pure24
strategy for the defender is to select a patrol P d for each patroller d ∈ D. If ia ∈ P d(Ia) for some25
d ∈ D, i.e., patroller d interrupts the attacker a, the attack will be unsuccessful. Otherwise, if26
ia /∈ P d(Ia) ∀d ∈ D the attacker a successfully damages node ia, the adversary gains a payoff27
6
of cj,τa+mj−1, where j = ia, and the defender loses a payoff of cj,τa+mj−1. The defender aims1
to minimize the expected total damage incurred from all attackers and the adversary wants to2
maximize it.3
The players play a zero-sum matrix game with the defender playing as the row player and
the set of all possible defense strategies constituting the rows of the matrix. The adversary plays
as the column player, with the set of all possible attack strategies constituting the columns of
the game matrix. We use K to denote the set of all possible defense strategies and k to index
them. Let xk be the probability of using defense strategy k in the defender’s mixed strategy.
Hence x = (x1, x2, . . . , x|K|) represents a mixed strategy of the defender, where |K| denotes the
cardinality of K, xk ≥ 0 ∀k ∈ K and∑k=|K|
k=1 xk = 1. Similarly, we use L to denote the set
of all possible attack strategies and index them by l. Let yl denote the probability of using
attack strategy l. Hence, a mixed strategy of the adversary is denoted as y = (y1, y2, . . . , y|L|),
yl ≥ 0 ∀l ∈ L, and∑l=|L|
l=1 yl = 1. The saddle point (Nash equilibrium) of the game is a point
(x∗,y∗) at which the following inequalities hold:
v(x∗,y) ≤ v(x∗,y∗) ≤ v(x,y∗),
where v(x,y) is the expected damage if the defender and the adversary use mixed strategies x4
and y, respectively.5
Although our model is a generalization of the model proposed by Alpern et al. (2011), some6
of their results are still valid for our model. We will use the following lemma that has been7
proved in Alpern et al. (2011) directly since the proof does not depend on the node values.8
Lemma 1. Suppose Q is connected, T ≥ 3 and mi ≥ 2,∀i. Then patrols that stay on any node9
for three consecutive periods are dominated.10
The game can be solved by generating all of the possible strategies for both players, however11
this may not be efficient for games of larger size. In the next section, we develop a solution12
approach to obtain a saddle-point equilibrium for this game.13
3 SOLUTION PROCEDURE14
In this section, a solution algorithm based on column and row generation (Riedel et al., 2012;15
7
Muter et al., 2013) is developed to obtain a saddle point for the patrolling game described in1
the previous section. The main challenge that may arise when developing a column and row2
generation algorithm is that the structure of the sub-problems may be destroyed due to the3
addition of new rows (Barnhart et al., 1998). However, in our case, since the new rows only4
affect the objective function coefficients, this difficulty does not arise. The solution method5
can also be described as a modification of the algorithm proposed by Godinho and Dias (2010,6
2013).7
Because this is a zero-sum game, a linear program (LP) can be developed to obtain a saddle-8
point equilibrium of this game. To formulate the LP for this game, we use the following binary9
parameters:10
� wkiτ =
1 if defense strategy k interrupts attack pair (i, τ),
0 Otherwise.
11
� zliτ =
1 if attack strategy l involves attack pair (i, τ),
0 Otherwise.
12
Using this notation, the following LP formulation can be developed to obtain a saddle-point
equilibrium for this game:
Minimize u
subject to u ≥∑k∈K
∑i,τ
ci,τ+mi−1zliτ (1− wkiτ )xk, ∀l ∈ L,
∑k∈K
xk = 1,
xk ≥ 0, ∀k ∈ K.
This problem is called the linear programming master (LPM). In this formulation, xk is a
decision variable representing the probability of using defense strategy k ∈ K in the defender’s
mixed strategy. In general, the sets K and L may be exponentially large; however, the number
of used strategies is expected to be much smaller. The proposed solution algorithm uses this
idea to start with a small subsets K′ ⊂ K and L′ ⊂ L of defense and attack strategies and
generates them as needed. In other words, we generate the defense strategies (columns) and
8
attack strategies (rows) on the fly. The starting subsets K′ and L′ could be any set of strategies.
Using the restricted set of strategies K′ and L′, we obtain the following LP:
Minimize u (1)
subject to u ≥∑k∈K′
∑i,τ
ci,τ+mi−1zliτ (1− wkiτ )xk, ∀l ∈ L′ (2)
∑k∈K′
xk = 1, (3)
xk ≥ 0, ∀k ∈ K′. (4)
This problem is called Restricted LPM (RLPM). The dual of RLPM is:
Maximize v (5)
subject to v ≤∑l∈L′
∑i,τ
ci,τ+mi−1zliτ (1− wkiτ )yl, ∀k ∈ K′, (6)
∑l∈L′
yl = 1, (7)
yl ≥ 0, ∀l ∈ L′. (8)
In this formulation, yl is the dual variable corresponding to constraint 2 in RLPM. This variable1
represents the probability of using attack strategy l in the adversary’s mixed strategy. Moreover,2
v is the dual variable corresponding to constraint 3 which represents the minimum expected3
damage. Next step is to find new strategies in K\K′ and L\L′ that could improve the current4
optimal solution for the corresponding players. Given the optimal dual solution yl of RLPM,5
the reduced cost of defense strategy k ∈ K\K′ is given by∑
l∈L′∑
i,τ ci,τ+mi−1zliτ (1−wkiτ )yl−v.6
Based on the concept of duality in linear programming, optimality of RLPM is equivalent7
to the feasibility of its dual. Therefore, defense strategies that violate the constraint v ≤8 ∑l∈L′
∑i,τ ci,τ+mi−1z
liτ (1 − wkiτ )yl can improve the current optimal solution. Thus, one should9
look for a defense strategy k with wkiτ such that: v >∑
l∈L′∑
i,τ ci,τ+mi−1zliτ (1 − wkiτ )yl. Note10
that yl’s are fixed, and the problem is to find a defense strategy k with wkiτ such that v >11 ∑l∈L′
∑i,τ ci,τ+mi−1z
liτ (1− wkiτ )yl. In other words, one looks for a new defense strategy k that12
leads to a smaller expected total damage,∑
l∈L′∑
i,τ ci,τ+mi−1zliτ (1 − wkiτ )yl, than the current13
expected total damage, v. To obtain an improving attack strategy for the adversary, consider14
9
RLPM. Given the optimal solution xk of RLPM, the current total expected damage is u. The1
total expected damage incurred by using attack strategy l is∑
k∈K′∑
i,τ ci,τ+mi−1zliτ (1−wkiτ )xk.2
Therefore, for fixed values of xk, one should find a new attack strategy l with zliτ such that3 ∑k∈K′
∑i,τ ci,τ+mi−1z
liτ (1 − wkiτ )xk > u. Subsection 3.1 develops mathematical programs to4
solve the defender’s sub-problem. Subsection 3.2 discusses the adversary’s sub-problem and5
subsection 3.3 describes the overall solution algorithm.6
3.1 Mathematical Formulations for the Defender’s Sub-problem7
In this section, we present two mathematical formulations to solve the defender’s sub-problem:8
A hop-type formulation and a flow-type formulation. We will compare the performance of these9
formulations numerically in section 4. Here is a list of binary variables used to formulate the10
defender’s sub-problem:11
� vdit =
1 if patroller d visits node i at time t,
0 Otherwise.
12
� wiτ =
1 if a patroller visits node i at time interval [τ, τ +mi − 1],
0 Otherwise.
13
Using this notation, the following hop-type formulation can be developed for the defender’s
sub-problem:
Maximize∑i,τ
ci,τ+mi−1wiτ∑l∈L′
ylzliτ (9)
subject to wiτ ≤∑d∈D
τ+mi−1∑t=τ
vdit, ∀i, τ, (10)
N∑i=1
vdit = 1, ∀t, d, (11)
vdit + vdj,t+1 ≤ 1, ∀i, j, t, d|i 6= j, (i, j) /∈ E , (12)
wiτ ∈ {0, 1}, ∀i, τ, (13)
vdit ∈ {0, 1}, ∀i, t, d. (14)
In this formulation, equation 9 represents the objective function which is minimizing the ex-14
10
pected damage. Note that, the expected damage is equal to∑
l∈L′∑
i,τ ci,τ+mi−1zliτ (1−wiτ )yl =1 ∑
l∈L′∑
i,τ ci,τ+mi−1zliτyl −
∑l∈L′
∑i,τ ci,τ+mi−1z
liτwiτyl where the first term is constant; hence,2
minimizing the expected damage is equivalent to maximizing∑
l∈L′∑
i,τ ci,τ+mi−1zliτwiτyl =3 ∑
i,τ ci,τ+mi−1wiτ∑
l∈L′ ylzliτ . The term
∑l∈L′ ylz
liτ in the objective function can be interpreted4
as the probability of using attack pair (i, τ) by the adversary. Equation 10 ensures that, if no5
patroller interrupts attack pair (i, τ), then wiτ is equal to zero. Equation 11 indicates that, for6
each patroller d, at each time t the patroller can be at exactly 1 node. Equation 12 ensures7
that, the patroller can not move from node i to node j if there is no edge between these nodes.8
Constraints 13 and 14 are the integrality constraint for variables wiτ and vdit.9
Note that, lemma 1 can be used to incorporate a new constraint to this formulation. Specifi-10
cally, the constraint vdi,t+vdi,t+1+v
di,t+2 ≤ 2, ∀i, t, d can be added to the formulation to eliminate11
the patrols that stay in the same node for three consecutive time periods. In the numerical12
experiments section, we will study the effect of this constraint on the overall performance of13
the algorithm.14
A flow-type mathematical formulation can also be developed to solve the defender’s sub-
problem. The sub-problem is formulated as follows:
Maximize∑i,τ
ci,τ+mi−1wiτ∑l∈L′
ylzliτ (15)
subject to wiτ ≤τ+mi−1∑t=τ
∑j
f tij, ∀i, τ, (16)
∑i,j
f 0ij = |D|, (17)
∑i
f tij =∑l
f t+1jl , ∀j, t, (18)
wiτ ,∈ {0, 1}, ∀i, τ, (19)
f tij,∈ Z+, ∀i, j, t. (20)
In this formulation, f tij is an integer variable that represents the flow of patrollers from node i to15
node j at time t. Equation 15 presents the objective function, which is identical to the objective16
function in equation 9. Constraint 16 ensures that, if no patroller interrupts (i, τ) attack pair,17
then wiτ is equal to zero. Constraint 17 indicates that, the initial flow of patrollers should be18
11
equal to the number of available patrollers, i.e. |D|. Constraint 18 is the flow conservation1
constraint. It ensures that, for each node j, the total incoming flow at time t is equal to2
the outgoing total flow at time t+ 1. Constraints 19 and 20 are the integrality constraints for3
variables wiτ and f tij, respectively. The flows obtained from this formulation can be transformed4
into patrols using the well-known flow decomposition algorithm (Orlin et al., 1993).5
Theorem 1. The defender’s sub-problem is NP-hard.6
Proof. See Appendix A7
Remark 1. Theorem 1 is valid even if ciτ = 1,∀i, τ and mis are equal to each other. In other8
words, even if we solve the model proposed by Alpern et al. (2011) using our proposed solution9
method, under a general graph, the defender’s sub-problem will still remain NP-hard.10
Even though Theorem 1 indicates that the defender’s sub-problems are hard to solve, the11
computational results show that, for problems with up to 30 nodes, the solution algorithm is12
able to find the Nash equilibrium by directly solving the sub-problem formulations.13
3.2 Mathematical Formulation for the Adversary’s Sub-problem14
The adversary’s sub-problem is formulated as follows:
Maximize∑i,τ
ci,τ+mi−1ziτ∑k∈K′
(1− wkiτ )xk (21)
subject to∑i,τ
ziτ ≤ |A|, (22)
ziτ ∈ {0, 1}, ∀i, τ. (23)
In this formulation, ziτ is a binary variable equal to 1 if an attacker is assigned attack pair (i, τ).15
Equation 21 presents the objective function, which is maximizing the expected damage. In this16
expression, the term∑
k∈K′(1−wkiτ )xk can be interpreted as the probability of not interrupting17
attack pair (i, τ). Constraint 22 indicates that, at most |A| attack pairs can be chosen to assign18
to attacker. Finally, constraint 23 is the integrality constraint for variable ziτ .19
The attacker’s sub-problem is a special case of 0-1 knapsack problem with unit item weights.20
Note that this problem can be solved in polynomial time by sorting the attack pairs (i, τ) in a21
non-increasing order of ci,τ+mi−1∑
k∈K′(1− wkiτ )xk and choosing the first |A| attack pairs.22
12
3.3 Overall Solution Procedure1
Algorithm 1 provides the pseudo-code for the overall solution procedure. The algorithm starts2
by randomly generating a set of initial strategies. Then, using this set of strategies, the RLPM3
is solved to obtain a solution x and a vector of dual values y. Dual values y are then used in4
the defender’s sub-problem to generate a new defense strategy. If a new defense strategy with5
a smaller expected damage is obtained, it is added to K′. Then the adversary’s sub-problem6
is solved to generate a new attack strategy. If a new attack strategy with a greater expected7
damage is obtained, it is added to L′. If, during the last two steps, either K′ or L′ has been8
updated, then the process is repeated; otherwise the procedure terminates. Because the number9
of possible strategies for both players is finite, the algorithm terminates after a finite number10
of iterations. Moreover, when the algorithm terminates, no player can improve the expected11
damage in their own favor by changing their strategies. Therefore, by definition, the algorithm12
returns a saddle-point upon termination.13
Algorithm 1: Pseudo-code for the overall solution algorithm
1 Initialize sets K′ and L′.2 Solve RLPM. Let x = [xk], y = [yl] and u be the optimal primal solution, dual solution
and objective function value, respectively.3 Solve the defender’s sub-problem using y as dual values and let w* = [w∗iτ ] denote the
optimal solution.4 if v >
∑l∈L′
∑i,τ ci,τ+mi−1z
liτ (1− w∗iτ )yl then
5 Add the new defense strategy w* to K′.6 end
7 Solve the attacker’s sub-problem using x as primal values and let z* = [z∗iτ ] be theoptimal solution.
8 if∑
k∈K′∑
i,τ ci,τ+mi−1z∗iτ (1− wkiτ )xk > v then
9 Add the new attack strategy z* to L′.10 end11 if K′ or L′ has been updated then12 Go to Line 2.13 else14 Return v as the value of the game.15 Terminate the procedure.
16 end
13
4 NUMERICAL EXPERIMENTS1
In this section, we perform computational experiments to investigate the efficiency of the pro-2
posed solution approaches and gain insight on some of its properties. The algorithms are coded3
in C++ and CPLEX 12.6 solver is used to solve the LPs and the defender’s sub-problems. A4
computer with 2.4 GH processor and 4 GB of RAM is used to run the numerical experiments.5
Our base set of test instances consists of randomly generated instances with underlying graph6
types including paths, cycles, grids and general planar graphs. To generate general planar7
graphs, the expected edge density (measured as |E||N |(|N |−1) , where we do not consider self-loop8
edges in calculating the edge density) of 15% is used, and the number of nodes, N , ranges from9
10 to 30. In generating general graphs, we first start with a random tree and then add random10
edges (such that the graph remains planar) until the edge density reaches 15%.11
In our first experiment, we compare the performances of different sub-problem formulations12
for the defender. Specifically, we consider three cases: the hop-type formulation of 9 to 1313
without the use of lemma 1 (HT), the hop-type formulation of 9 to 13 with the use of lemma14
1 (HTL) and the flow-type formulation of 15 to 20 (FT). We consider 45 instances for each15
problem size, with various values of T ∈ {5, 6, . . . , 9}, |D| ∈ {1, 2, 3} and |A| ∈ {1, 2, 3}. Tables16
1 and 2 show the average computation times observed for these three approaches for paths,17
cycles, general planar graphs and grids. In these tables, the best results are highlighted in bold.18
As seen in the tables, the flow-type formulation performs significantly better than the other19
formulations. Moreover, for most of the instances, the use of lemma 1 is not helpful and it20
leads to higher CPU times than when the lemma is not used. In the remaining experiments21
in this section, we will always use the FT formulation to solve the defender’s sub-problem;22
as it is the most efficient formulation out of the three possible ones. Figure 1 shows the23
convergence trajectory of the solution algorithm for the general planar graphs under various24
values of N . In this figure, the vertical axis denotes the relative percent deviation (RPD) from25
the expected damage in equilibrium. One can see that, expected damage values stabilize way26
before the algorithm terminates, implying that, after the expected damage values stabilize, we27
can terminate the algorithm without undermining the solution quality drastically.28
14
Table 1: Obtained CPU times (in seconds) for paths and cycles
Path Cycle
N HT HTL FT HT HTL FT
10 32.83 29.94 11.71 38.61 41.44 29.4415 120.03 119.11 25.91 301.74 332.30 35.9120 252.39 408.31 30.12 402.73 436.94 36.1125 435.88 464.59 46.97 412.95 427.83 65.9230 496.84 529.23 61.98 542.12 572.56 65.13
Table 2: Obtained CPU times (in seconds) for general graphs and grids
General Grid
N HT HTL FT HT HTL FT
10 18.64 18.84 8.22 22.37 23.83 10.3015 207.52 221.02 37.49 153.42 162.17 34.9820 404.96 431.14 100.41 396.91 463.22 41.7925 539.08 574.49 124.97 543.61 573.61 68.4130 700.48 742.31 341.36 690.21 714.65 93.46
-60
-40
-20
0
20
40
60
0 10 20 30 40 50 60RP
D
Iteration
N=10
N=15
N=20
N=25
N=30
Figure 1: Convergence of the solution algorithm
Next, we study the effect of the number of patrollers and attackers on the expected damage1
in equilibrium. An experiment is designed with general planar graphs, |D| ∈ {1, 2, . . . , 10} and2
|A| ∈ {1, 2, . . . , 5}. Figure 2 exhibits the effect of number of defenders, |D|, on the equilibrium3
expected damage for various number of attackers, |A|. Figure 2 demonstrates that, as the4
number of defenders increases, the expected damage decreases. Moreover, a diminishing returns5
effect is visible in the reduction in expected damage for each unit increment in |D|.6
15
0
10
20
30
40
50
60
70
1 2 3 4 5 6 7 8 9 10
Exp
ecte
d D
amag
e
|D|
|A|=1
|A|=2
|A|=3
|A|=4
|A|=5
Figure 2: The effect of number of defenders on the expected damage in equilibrium
Next, we study the size of patrol portfolio for the case of |D| = |A| = 1. As mentioned1
earlier, the total number of defense strategies may be exponentially large; however, the number2
of defense strategies used in the saddle-point equilibrium is expected to be much smaller. In3
fact, when |D| = |A| = 1, the number of defense strategies used in the equilibrium with a4
positive probability is at most equal to the number of constraints in the LPM, which is limited5
from above by N × T + 1. Figure 3 displays the size of patrol portfolio as a percentage of6
N × T + 1 for different values of N and T . As seen in this figure, the actual size of the patrol7
portfolio is significantly smaller than N × T + 1. It is always less than 50 percent of N × T + 18
and can be as low as 10 percent. Moreover, as T increases, the percentage decreases. However,9
not much can be said about the effect of N on the patrol portfolio size as a percentage of10
N × T + 1.11
0
5
10
15
20
25
30
35
40
5 6 7 8 9
Per
cent
T
N=10
N=15
N=20
N=25
N=30
Figure 3: Effect of N and T on the size of patrol paortfolio
In our next experiment, we demonstrate that one of the results obtained in Alpern et al.12
16
(2011) is not valid for our more general model. Specifically, Alpern et al. (2011) prove that1
if all nodes have the same fixed value, then attacks on penultimate nodes are dominated. A2
penultimate node is defined as a non-leaf node adjacent to a leaf node. We show that, if the3
node values are different, then attacks on penultimate nodes may not be dominated. To this4
end, we use an instance of a game played on a line graph with T = 5, N = 8, |D| = |A| = 1 and5
mi = 3,∀i ∈ N . Node values are assumed to be fixed over the time horizon. Figure 4 shows6
the graph of this game with corresponding node values represented above each node. As seen7
in this figure, node 2 is a penultimate node with value c; other nodes all have a value of 1 unit.8
1
1
2
c
3
1
4
1
5
1
6
1
7
1
8
1
Figure 4: Patrolling game example on a line graph
We solve this game for different values of c under two cases: unconstrained case where the9
attacker is free to attack any node, and constrained case where the attacker cannot attack node10
2. Figure 5 shows the results for various values of c. As seen in this figure, the unconstrained11
attacker can cause more damage than the constrained attacker. Moreover, as the value of c12
increases, the difference between constrained and unconstrained case increases and the attacker13
has more incentive to attack node 2. In other words, by being able to attack node 2, the14
attacker can increase the damage. This means that attacking node 2 dominates not attacking15
node 2. Therefore, attacks on penultimate nodes may not be dominated.16
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
0 10 20 30 40 50
Exp
ecte
d D
amag
e
c
Unconstrained
Constrained
Figure 5: Comparison of constrained and unconstrained cases
In our final set of experiments, we study a real case of an urban rail network with 51 nodes17
17
used by Yolmeh and Baykal-Gursoy (2018). In this case, the nodes in the network represent the1
stations and the edges represent the connections among these stations. The rail network for this2
case consists of two main lines that are connected with a free interchange point between them.3
Node values represent the time-dependent occupancy levels in each station. We consider a 12-4
hour work-shift, starting from 5:00 AM and ending at 5:00 PM, for this case. For more details5
about this case, please refer to Yolmeh and Baykal-Gursoy (2018). We used our proposed6
solution approach to solve this problem for |A| = 3 and |D| = 10. For this instance of the7
problem, the algorithm terminates after 88 minutes. Figure 6 shows the obtained expected8
damage in the first 100 iterations of the solution algorithm for this case. As seen in this figure,9
after around 90 iterations, the expected damage value stabilizes and does not change drastically10
after this point. This observation is in line with our previous experiments.11
0
1000
2000
3000
4000
5000
6000
7000
0 20 40 60 80 100
Exp
ecte
d D
amag
e
Iteration
Figure 6: Convergence of the solution algorithm for the case study
Next, we study the distribution of patroller visits across different stations. Figure 7 shows12
the expected number of visit in a 1-month period for 5 most important stations for the case13
with 10 patrollers and 3 attackers. As seen in this figure, for most stations, the visits are almost14
equally distributed throughout the time horizon with a noticeable valley in the beginning hours15
and two slight peaks: one starts around 7:00 AM and ends around 9:00 AM, another one starts16
around 2:00 PM and ends around 4:00 PM.17
18
0
5
10
15
20
25
30
5:00
AM
6:00
AM
7:00
AM
8:00
AM
9:00
AM
10:00
AM
11:00
AM
12:00
PM
1:00
PM
2:00
PM
3:00
PM
4:00
PM
Exp
ecte
d N
um
ber
of
Vis
its
Time of Day
Station 41
Station 38
Station 6
Station 17
Station 24
Figure 7: Expected number of visits for 5 most visited stations
Next, we study the distribution of expected damage across most vulnerable stations. Figure1
8 shows the distribution of expected damage over the time horizon for 10 most vulnerable2
stations, for the case with 10 patrollers and 3 attackers. As seen in this figure, for most3
stations, the expected damage concentrates on two time intervals: one starts around 7:00 AM4
and ends around 9:00 AM, another one starts around 1:00 PM and ends around 4:00 PM.5
0
2
4
6
8
10
12
14
16
18
20
5:00
AM
6:00
AM
7:00
AM
8:00
AM
9:00
AM
10:00
AM
11:00
AM
12:00
PM
1:00
PM
2:00
PM
3:00
PM
4:00
PM
Exp
ecte
d D
amag
e
Time of Day
Station 34 Station 47 Station 17 Station 22 Station 6
Station 38 Station 27 Station 44 Station 13 Station 10
Figure 8: Distribution of expected damage for 10 stations with highest expected damage values
Next, we study the effect of the number of patrollers and attackers on the expected damage.6
Figure 9 shows the expected damage in equilibrium for different values of number of patrollers,7
|D|, and number of attackers, |A|. As seen in this figure, as the number of patrollers increases,8
the expected damage decreases. There is also a visible diminishing returns effect. Meaning9
that, as the number of patrollers increases, the reduction in expected damage by adding one10
more patroller, decreases.11
19
0
1000
2000
3000
4000
5000
6000
1 2 3 4 5 6 7 8 9 10
Exp
ecte
d D
amag
e
|D|
|A|=1
|A|=2
|A|=3
|A|=4
|A|=5
Figure 9: The effect of number of patrollers
5 CONCLUSIONS AND FUTURE RESEARCH1
In this paper, we propose a patrolling game model on general graphs with time dependent node2
values and node dependent attack times. We also consider multiple patrollers and attackers in3
our model. We then develop an efficient algorithm to obtain a Nash equilibrium for this game.4
Our computational results show the efficiency of the proposed solution approach. Our results5
also show a diminishing returns effect when increasing the number of patrollers.6
This paper addresses a gap in the literature by considering a patrolling game model with7
time dependent node values, node based attack times, multiple patrols and multiple attackers.8
It also develops a solution algorithm to solve the problem under general graphs. However, there9
are still some limitations that need to be addressed in future research. Patrolling games under10
incomplete information, where the defender does not know about attacker’s valuation of the11
targets or the number of attackers is a possible avenue for future research in this area. Moreover,12
extending the model to accommodate uncertainty of parameters, such as attack times and node13
values, is another avenue for future research in this area.14
References15
Steve Alpern, Alec Morton, and Katerina Papadaki. Patrolling games. Operations Research,16
59(5):1246–1257, 2011.17
Cynthia Barnhart, Ellis L Johnson, George L Nemhauser, Martin WP Savelsbergh, and18
20
Pamela H Vance. Branch-and-price: Column generation for solving huge integer programs.1
Operations research, 46(3):316–329, 1998.2
Nicola Basilico, Nicola Gatti, and Francesco Amigoni. Patrolling security games: Definition3
and algorithms for solving large instances with single patroller and single intruder. Artificial4
Intelligence, 184:78–123, 2012.5
Nicola Basilico, Giuseppe De Nittis, and Nicola Gatti. Adversarial patrolling with spatially6
uncertain alarm signals. Artificial Intelligence, 246:220–257, 2017.7
Jan M Chaiken and Peter Dormont. A patrol car allocation model: Capabilities and algorithms.8
Management Science, 24(12):1291–1300, 1978.9
Kenneth Chelst. An algorithm for deploying a crime directed (tactical) patrol force. Manage-10
ment Science, 24(12):1314–1327, 1978.11
Drew Fudenberg and Jean Tirole. Game theory, 1991. Cambridge, Massachusetts, 393(12):80,12
1991.13
Michael R Gary and David S Johnson. Computers and intractability: A guide to the theory of14
np-completeness, 1979.15
Pedro Godinho and Joana Dias. A two-player competitive discrete location model with simul-16
taneous decisions. European Journal of Operational Research, 207(3):1419–1432, 2010.17
Pedro Godinho and Joana Dias. Two-player simultaneous location game: Preferential rights18
and overbidding. European Journal of Operational Research, 229(3):663–672, 2013.19
Meigu Guan. Graphic programming using odd and even points. Chinese Math., 1:237–277,20
1962.21
Dorit S Hochbaum, Cheng Lyu, and Fernando Ordonez. Security routing games with multive-22
hicle Chinese postman problem. Networks, 64(3):181–191, 2014.23
Manish Jain, Erim Kardes, Christopher Kiekintveld, Fernando Ordonez, and Milind Tambe.24
Security games with arbitrary schedules: A branch and price approach. In AAAI, 2010.25
21
Brian Michael Jenkins and Joseph Trella. Carnage interrupted: An analysis of fifteen terrorist1
plots against public surface transportation. Technical report, 2012.2
Dmytro Korzhyk, Vincent Conitzer, and Ronald Parr. Security games with multiple attacker3
resources. In IJCAI Proceedings-International Joint Conference on Artificial Intelligence,4
volume 22, page 273, 2011.5
Felipe Lagos, Fernando Ordonez, and Martine Labbe. A branch and price algorithm for a6
Stackelberg security game. Computers & Industrial Engineering, 111:216–227, 2017.7
Richard C Larson. Urban police patrol analysis, volume 28. MIT Press Cambridge, MA, 1972.8
Kyle Y Lin, Michael P Atkinson, Timothy H Chung, and Kevin D Glazebrook. A graph patrol9
problem with random attack times. Operations Research, 61(3):694–710, 2013.10
Kyle Y Lin, Michael P Atkinson, and Kevin D Glazebrook. Optimal patrol to uncover threats11
in time when detection is imperfect. Naval Research Logistics (NRL), 61(8):557–576, 2014.12
J. Lou, A. M. Smith, and Y. Vorobeychik. Multidefender security games. IEEE Intelligent13
Systems, 32(1):50–60, Jan 2017. ISSN 1541-1672. doi: 10.1109/MIS.2017.11.14
Richard G McGrath and Kyle Y Lin. Robust patrol strategies against attacks at dispersed15
heterogeneous locations. International Journal of Operational Research, 30(3):340–359, 2017.16
John Moteff. Risk management and critical infrastructure protection: Assessing, integrating,17
and managing threats, vulnerabilities and consequences. DTIC Document, 2005.18
Ibrahim Muter, S Ilker Birbil, and Kerem Bulbul. Simultaneous column-and-row generation for19
large-scale linear programs with column-dependent-rows. Mathematical Programming, 14220
(1-2):47–82, 2013.21
David G Olson and Gordon P Wright. Models for allocating police preventive patrol effort.22
Journal of the Operational Research Society, 26(4):703–715, 1975.23
JB Orlin, RK Ahuja, and TL Magnanti. Network flows, theory, algorithms and applications.24
Prentice Hall, 1993.25
22
Katerina Papadaki, Steve Alpern, Thomas Lidbetter, and Alec Morton. Patrolling a border.1
Operations Research, 64(6):1256–1269, 2016.2
Sebastian Riedel, David Smith, and Andrew McCallum. Parse, price and cut: delayed column3
and row generation for graph based parsers. In Proceedings of the 2012 Joint Conference4
on Empirical Methods in Natural Language Processing and Computational Natural Language5
Learning, pages 732–743. Association for Computational Linguistics, 2012.6
Abdolmajid Yolmeh and Melike Baykal-Gursoy. Urban rail patrolling: a game theoretic ap-7
proach. Journal of Transportation Security, 11(1-2):23–40, 2018.8
A Proof of Theorem 19
Proof. Our proof is similar to the NP-hardness proof of the DET-STRAT problem (Theorem10
4.4) in Basilico et al. (2012). We first prove the result for the case with |D| = |A| = 1. Once11
the NP-hardness of this case is established, NP-hardness of the case with multiple patrollers12
and attackers follows due to the added complexity. To prove NP-hardness, we prove that the13
corresponding decision problem is NP-complete. Here is the corresponding decision problem:14
Given a patrolling game G = G(Q, T,m, c) and dual values [yiτ ] (note that we assumed there15
is only one attacker and yiτ is the probability of using attack pair (i, τ) by this single attacker)16
is there a patroll with [wiτ ] such that:17
N∑i=1
T−mi∑τ=0
yiτci,τ+mi−1wiτ ≥ L?
For future references, we call this problem DP. It is easy to see that DP is in NP. A non-18
deterministic machine could guess a patrol with wiτ and check if∑N
i=1
∑Tτ=0 yiτci,τ+mi−1wiτ ≥ L19
is true, in polynomial time. Its NP-completeness can be shown by reducing the Hamiltonian20
path (HP) problem to DP. HP is a well-known NP-complete problem (Gary and Johnson, 1979).21
It is the problem of determining if a Hamiltonian path, i.e., a path that visits each vertex22
exactly once, exists on a given graph. Following Basilico et al. (2012) let us consider a generic23
HP problem given by a graph Gh = (V h, Ah) where V h is the set of vertices and Ah is the set24
of edges. From this graph, an instance of the DP problem with G = G(Q, T,m, c), L and [yiτ ]25
23
can be constructed in polynomial time by setting Q = Gh, T = |V h|, mi = |V h|, yiτ = 1, ∀i, τ ,1
ciτ = 1,∀i, τ and L = |V h|. It is easy to see that a solution to DP with G = G(Q, T,m, c),2
L and [yiτ ], if exists, is a Hamiltonian path. To achieve L = |V h|, every node should be fully3
covered by the patrol i.e. all attack pairs (i, τ) should be caught by the patrol. Note that, for4
every node i, the only admissible attack start time is τ = 0. Therefore, there are |V h| attack5
pairs and to achieve L = |V h|, all of these attack pairs should be interrupted. To interrupt each6
attack pair (i, τ), it is enough to visit node i during time interval [0, T − 1]. Because the length7
of time horizon is equal to the number of nodes, the patrol should visit each node exactly once,8
because otherwise at least one node will not be covered. Therefore, computing a solution for9
DP with G = G(Q, T,m, c), L and [yiτ ] provides a solution for the HP problem with Gh. Thus10
the HP problem can be reduced to DP. This proves that DP is NP-complete.11
24