A Double Oracle Algorithm for Zero-Sum Security Games. Graphs, Manish Jain, Dmytro Korzhyk, Ondřej Vaněk, Vincent Conitzer,

A Double Oracle Algorithm for Zero-Sum Security Game on Graph Manih Jain, Dmytro Korzhyk, Ondřej Vaněk +, Vincent Conitzer, Michal Pěchouček +, Milind Tambe Computer Science Department, Univerity of Southern California, Lo Angele, CA. 90089 {manih.jain,tambe}@uc.edu Department of Computer Science, Duke Univerity, Durham, NC. 27708 {dima,conitzer}@c.duke.edu + Department of Cybernetic, Czech Technical Univerity, Prague. Czech Republic. {vanek,pechoucek}@agent.felk.cvut.cz ABSTRACT In repone to the Mumbai attack of 2008, the Mumbai police have tarted to chedule a limited number of inpection checkpoint on the road network throughout the city. Algorithm for imilar ecurity-related cheduling problem have been propoed in recent literature, but ecurity cheduling in networked domain when target have varying importance remain an open problem at large. In thi paper, we cat the network ecurity problem a an attackerdefender zero-um game. The trategy pace for both player are eponentially large, o thi require the development of novel, calable technique. We firt how that eiting algorithm for approimate olution can be arbitrarily bad in general etting. We preent RUGGED (Randomization in Urban Graph by Generating trategie for Enemy and Defender), the firt calable optimal olution technique for uch network ecurity game. Our technique i baed on a double oracle approach and thu doe not require the enumeration of the entire trategy pace for either of the player. It cale up to realitic problem ize, a i hown by our evaluation of map of outhern Mumbai obtained from GIS data. Categorie and Subject Decriptor I.2.11 [Artificial Intelligence]: Ditributed Artificial Intelligence General Term Algorithm, Security, Performance Keyword Game theory, Double oracle, Zero-um game, Minima equilibrium 1. INTRODUCTION Securing urban city network, tranportation network, computer network and other critical infratructure i a large and growing Cite a: A Double Oracle Algorithm for Zero-Sum Security Game on Graph, Manih Jain, Dmytro Korzhyk, Ondřej Vaněk, Vincent Conitzer, Michal Pěchouček, Milind Tambe, Proc. of 10th Int. Conf. on Autonomou Agent and Multiagent Sytem (AAMAS 2011), Tumer, Yolum, Sonenberg and Stone (ed.), May, 2 6, 2011, Taipei, Taiwan, pp. 327-334. Copyright c 2011, International Foundation for Autonomou Agent and Multiagent Sytem (www.ifaama.org). All right reerved. area of concern. The key challenge faced in thee domain i to effectively chedule a limited number of reource to protect againt an intelligent and adaptive attacker. For eample, a police force ha limited peronnel to patrol, operate checkpoint, or conduct earche. The adverarial apect poe ignificant challenge for the reource allocation problem. An intelligent attacker may oberve the trategy of the defender, and then plan more effective attack. Predictable cheduling of defender reource can be eploited by the attacker. Randomization ha thu been ued to keep attacker at bay by increaing the uncertainty they face. Game theory offer a principled way of achieving effective randomization. It model the varying preference of both the defender and the attacker, and allow u to olve for optimal trategie. Recent work ha alo ued and deployed game-theoretic technique in real-world attacker-defender cenario, for eample, ARMOR [13] and IRIS [10]. In thi paper, we model an urban network ecurity problem a a game with two player: the defender and the attacker. The pure trategie of the defender correpond to allocation of reource to edge in the network for eample, an allocation of police checkpoint to road in the city. The pure trategie of the attacker correpond to path from any ource node to any target node for eample, a path from a landing pot on the coat to the airport. The trategy pace of the defender grow eponentially with the number of available reource, wherea the trategy pace of the attacker grow eponentially with the ize of the network. For eample, in a fully connected graph with 20 node and 190 edge, the number of defender action for only 5 reource i ( ) 190 5 2 billion, while the number of poible attacker path without any cycle i 6.6 10 18. Real-world network are ignificantly larger, e.g., a implified graph repreenting the road network in outhern Mumbai ha more than 250 node (interection) and 600 edge (treet), and the ecurity force can deploy ten of reource. We model the cenario a a zero-um game, where the attacker get a poitive payoff in cae of a ucceful attack and 0 otherwie, and the payoff to the defender i the negative of the attacker payoff. Our goal i to find a minima trategy for the defender, that i, a trategy that minimize the maimum epected utility that the attacker can obtain. 1 The etremely large ize of the game 1 Becaue in thi work, we aume the game to be zero-um, a minima trategy i equivalent to a Stackelberg trategy (where the defender find the optimal mied trategy to commit to); moreover, via von Neumann minima theorem [12] (or linear programming duality), minima trategie alo correpond eactly to Nah equi- 327

inhibit the direct application of tandard method for finding minima trategie. Thu, we propoe a double-oracle baed approach that doe not require the e-ante enumeration of all pure trategie for either of the player. We propoe algorithm for both the defender and the attacker oracle problem, which are ued iteratively to provide pure-trategy bet repone for both player. While we preent NP-hardne proof for the oracle problem for both player, the entire approach remain calable in practice, a i hown in our eperiment. We alo provide eperimental reult on real-city network, pecifically on graph obtained from the GIS data of outhern Mumbai. The graph repreentation of outhern Mumbai ha 250 node and 600 edge. The placement of ource and target in the eperiment wa inpired by the Mumbai 2008 attack where the target were important economic and political center and the ource were placed along the coat line. Our eperimental reult how that thi problem remain etremely difficult to olve. While we how the previou approimation method to not be ready for deployment, our own technique will need to be enhanced further for real deployment in the city of Mumbai. We believe the problem remain within reach, and i clearly an eciting and important area for continued reearch. 2. RELATED WORK Game theory ha been applied to a wide range of problem where one player the evader trie to minimize the probability of detection by and/or encounter with the other player the patroller; the patroller want to thwart the evader plan by detecting and/or capturing him. The formalization of thi problem led to a family of game, often called puruit-evaion game [1]. A there are many potential application of thi general idea, more pecialized game type have been introduced, e.g., hider-eeker game [7, 9] and infiltration game [2] with mobile patroller and mobile evader; earch game [8] with mobile patroller and immobile evader; and ambuh game [14] with the mobility capabilitie revered. In the game model propoed in thi paper, the evader i mobile wherea the patroller i not, jut like in ambuh game. However, in contrat with ambuh game, we conider target (termed detination in ambuh game) of varying importance. Our game model i mot imilar to that of interdiction game [17], where the evading player the attacker move on an arbitrary graph from one of the origin to one of the detination (aka. target); and the interdicting player the defender inpect one or more edge in the graph in order to detect the attacker and prevent him from reaching the target. A oppoed to interdiction game, we do not conider the detection probability on edge, but we allow different value to be aigned to the target, which i crucial for real-world application. Recent work ha alo conidered cheduling multiple-defender reource uing cooperative game-theory, a in path diruption game [3], where the attacker trie to reach a ingle known target. In contrat with the tatic aet protection problem [6], we attribute different importance to individual target and unlike it dynamic variant [6], we conider only tatic target poition. Recent work in ecurity game and robotic patrolling [4, 10] ha focued on concrete application. However, they have not conidered the cale-up for both defender and attacker trategie. For eample, in ASPEN, the attacker pure trategy pace i polynomially large, ince the attacker i not following any path and jut chooe eactly one target to attack. Our game model wa introduced by Tai et al. [15]; librium trategie. For a dicuion of the relationhip among thee concept in ecurity game, which include zero-um game, ee Yin et al. [18]. however, their approimate olution technique can be uboptimal. We dicu the hortcoming of their approach in Section 4, and provide an optimal olution algorithm for the general cae. Technique ued by RUGGED are baed on a double oracle approach, a propoed by McMahan et al. [11] (correponding eactly to the notion of contraint and column generation in linear programming). Thi technique i intended to olve large-cale game, and i epecially ueful in etting where efficient algorithm for the bet-repone oracle problem are available. Double oracle algorithm have ubequently been applied to variou puruit-evaion game [9, 16]. While the bet-repone oracle problem are NPhard in our etting (a we how in Section 5.3 and 5.4), we give algorithm for thee problem that allow the approach to till cale to realitic intance. 3. PROBLEM DESCRIPTION A network ecurity domain, a introduced by Tai et al. [15], i modeled uing a graph G = (N, E). The attacker tart at one of the ource node S N and travel along a path of hi chooing to any one of the target t T N. The attacker pure trategie are thu all the poible t path from any ource S to any target t T. The defender trie to catch the attacker before he reache any of the target by placing k available (homogeneou) reource on edge in the graph. The defender pure trategie are thu all the poible allocation of k reource to edge, o there are ( ) E k in total. Auming the defender play allocation Xi E, and the attacker chooe path A j E, the attacker ucceed if and only if X i A j =. Additionally, a payoff T (t) i aociated with each target t, uch that the attacker get T (t) for a ucceful attack on t and 0 otherwie. The defender receive T (t) in cae of a ucceful attack on t and 0 otherwie. The network ecurity domain i modeled a a complete-information zero-um game, where the et S of ource, T of target, the payoff T for all the target and the number of defender reource k are known to both the player a-priori. The objective i to find the mied trategy of the defender, correponding to a Nah equilibrium (equivalently, a minima trategy) of thi network ecurity game. The notation ued in the paper i decribed in Table 1. G(N, E) Urban network graph T Target payoff k Defender reource X Set of defender allocation, X = {X 1, X 2,..., X n} X i i th defender allocation. X i = {X ie} e, X ie {0, 1} A Set of attacker path, A = {A 1, A 2,..., A m} A j j th attacker path. A j = {A je} e, A je {0, 1} Defender mied trategy over X a Adverary mied trategy over A U d (, A j) Defender epected utility playing againt A j Λ Defender pure trategy bet repone Γ Attacker pure trategy bet repone Table 1: Notation 4. RANGER COUNTEREXAMPLE RANGER wa introduced by Tai et al. [15] and wa deigned to obtain approimate olution for the defender for the network ecurity game. It main component i a polynomial-ized linear program that, rather than olving for a ditribution over allocation, olve for the marginal probability with which the defender cover 328

each edge. It doe thi by approimating the capture probability a the um of the marginal along the attacker path. It further preent ome ampling technique to obtain a ditribution over defender allocation from thee marginal. What wa known before wa that the RANGER olution (regardle of the ampling method ued) i uboptimal in general, becaue it i not alway poible to find a ditribution over allocation uch that the capture probability i indeed the um of marginal on the path. In thi paper, we how that RANGER error can be arbitrarily large. Let u conider the eample graph hown in Figure 1. Thi multigraph 2 ha a ingle ource node,, and two target, t 1 and t 2; the defender ha 2 reource. Furthermore, the payoff T of the target are defined to be 1 and 2 for target t 1 and t 2 repectively. a a b a Figure 1: Thi eample i olved incorrectly by RANGER. The variable a, b are the coverage probabilitie on the correponding edge. RANGER olution: Suppoe RANGER put marginal coverage probability a on each of the three edge between and t 1, 3 and probability b on the edge between t 1 and t 2, a hown in Figure 1. RANGER etimate that the attacker get caught with probability a when attacking target t 1 and probability a + b when attacking target t 2. RANGER will attempt to make the attacker indifferent between the two target to obtain the minima equilibrium. Thu, RANGER output i a = 3/5, b = 1/5, obtained from the following ytem of equation: 1(1 a) = 2(1 (a + b)) (1) 3a + b = 2 (2) However, there can be no allocation of 2 reource to the edge uch that the probability of the attacker being caught on hi way to t 1 i 3/5 and the probability of the attacker being caught on hi way to t 2 i 4/5. (The reaon i that in thi eample, the event of there being a defenive reource on the econd edge in the path cannot be dijoint from the event of there being one on the firt edge.) In fact, for thi RANGER olution, the attacker cannot be caught with a probability of more than 3/5 when attacking target t 2, and o the defender utility cannot be greater than 2(1 3/5) = 4/5. Optimal olution: Figure 2 how the i poible allocation of the defender two reource to the four edge. Three of them block ome pair of edge between and t 1. Suppoe that each of thee three allocation i played by the defender with probability. 4 Each of the other three allocation block one edge between and t 1 a well a the edge between t 1 and t 2. Suppoe the defender chooe thee allocation with probability y each (refer Figure 2). The probability of the attacker being caught on hi way to t 1 i 2 3 + 1 3y, or 2 + 3 3 2 We ue a multi-graph for implicity. Thi countereample can eaily be converted into a imilar countereample that ha no more than one edge between any pair of node in the graph. 3 We can aume without lo of olution quality that ymmetric edge will have equal coverage. 4 Again, thi can be aumed without lo of generality for ymmetric edge. Figure 2: The poible allocation of two reource to the four edge. The blocked edge are hown in bold. The probabilitie ( or y) are hown net to each allocation. y. Similarly, the probability of the attacker being caught on hi way to t 2 i 2 + 3y. Thu, a minima trategy for thi problem i the olution of Equation (3) and (4), which make the attacker indifferent between target t 1 and t 2. 1(1 2 y) = 2(1 2 3y) (3) 3 + 3y = 1 (4) The olution to the above ytem i = 2/9, y = 1/9, o that the epected attacker utility i 4/9. Thu, the epected defender utility i 4/9, which i higher than the epected defender utility of at mot 4/5 reulting from uing RANGER. RANGER ub-optimality: Suppoe the payoff T (t 2) of target t 2 in the eample above wa H, H > 1. The RANGER olution in thi cae, again obtained uing Equation 1 and 2, would be a = (H+1), b = (H 1). (2H+1) (2H+1) Then, conider an attacker who attack the target t 2 by firt going through one of the three edge from to t 1 uniformly at random (and then on to t 2). The attacker will fail to be caught on the way from t 1 to t 2 with probability (1 b), given that the defender trategy i conitent with the output of RANGER. Even conditional on thi failure, the attacker will fail to be caught on the way from to t 1 with probability at leat 1/3, becaue the defender ha only 2 reource. Thu, the probability of a ucceful attack on t 2 i at leat (1 b)(1/3), and the attacker bet-repone utility i at leat: H(1 b) 3 = H(H + 2) H(H + 0.5) > 3(2H + 1) 3(2H + 1) = H 6 Thu, the true defender utility for any trategy conitent with RAN- GER i at mot H. 6 Now, conider another defender trategy in which the defender alway block the edge from t 1 to t 2, and alo block one of the three edge between and t 1 uniformly at random. For uch a defender trategy, the attacker can reach t 1 with probability 2/3, but cannot reach target t 2 at all. Thu, the attacker bet-repone utility in thi cae i 2/3. Therefore, the optimal defender utility i at leat 2/3. Therefore, any olution conitent with RANGER i at leat H / 2 = H uboptimal. Since H i arbitrary, RANGER 6 3 4 olution can be arbitrarily uboptimal. Thi motivate our eact, double-oracle algorithm, RUGGED. y y y (5) 5. DOUBLE-ORACLE APPROACH In thi ection, we preent RUGGED, a double-oracle baed algorithm for network ecurity game. We alo analyze the computational compleity of determining bet repone for both the defender and the attacker, and, to complete the RUGGED algorithm, we give algorithm for computing the bet repone. 329

5.1 Algorithm The algorithm RUGGED i preented a Algorithm 1. X i the et of defender allocation generated o far, while A i the et of attacker path generated o far. CoreLP(X, A) find an equilibrium (and hence, minima and maimin trategie) of the two-player zero-um game coniting of the et of pure trategie, X and A, generated o far. CoreLP return and a, which are the current equilibrium mied trategie for the defender and the attacker over X and A repectively. The defender oracle (DO) generate a defender allocation Λ that i a bet repone for the defender againt a. (Thi i a bet repone among all allocation, not jut thoe in X.) Similarly, the attacker oracle (AO) generate an attacker path Γ that i a bet repone for the attacker againt. Algorithm 1 Double Oracle for Urban Network Security 1. Initialize X by generating arbitrary candidate defender allocation. 2. Initialize A by generating arbitrary candidate attacker path. repeat 3. (, a) CoreLP(X, A). 4a. Λ DO(a). 4b. X X {Λ}. 5a. Γ AO(). 5b. A A {Γ}. until convergence 7. Return (, a) The double oracle algorithm thu tart with a mall et of pure trategie for each player, and then grow thee et in every iteration by applying the bet-repone oracle to the current olution. Eecution continue until convergence i detected. Convergence i achieved when the bet-repone oracle of both the defender and the attacker do not generate a pure trategy that i better for that player than the player trategy in the current olution (holding the other player trategy fied). In other word, convergence i obtained if, for both player, the reward given by the bet-repone oracle i no better than the reward for the ame player given by the CoreLP. The correctne of bet-repone-baed double oracle algorithm for two-player zero-um game ha been etablihed by McMahan et al [11]; the intuition for thi correctne i a follow. Once the algorithm converge, the current olution mut be an equilibrium of the game, becaue each player current trategy i a bet repone to the other player current trategy thi follow from the fact that the bet-repone oracle, which earche over all poible trategie, cannot find anything better. Furthermore, the algorithm mut converge, becaue at wort, it will generate all pure trategie. 5.2 CoreLP The purpoe of CoreLP i to find an equilibrium of the retricted game coniting of defender pure trategie X and attacker pure trategie A. Below i the tandard formulation for computing a maimin trategy for the defender in a two-player zero-um game. ma U U d, d (6).t. U d U d (, A j) j = 1,..., A (7) 1 T = 1 (8) [0, 1] X (9) The defender mied trategy, defined over X, and utility U d are the variable for thi problem. Inequality (7) i family of contraint; there i one contraint for every attacker path A j in A. The function U d (, A j) i the epected utility of the attacker path A j. Given A j, the probability that the attacker i caught i the um of the probabilitie of the defender allocation that would catch the attacker. (We can um thee probabilitie becaue they correpond to dijoint event.) More preciely, let z ij be an indicator for whether allocation X i interect with path A j, that i, { 1 if Xi A z ij = j (10) 0 otherwie Thee z ij are not variable of the linear program; they are parameter that are determined at the time the bet repone are generated. Then, the probability that an attacker playing path A j i caught i i ziji, and the probability that he i not caught i i (1 zij)i. Thu, the payoff function U d (, A j) for the defender for chooing a mied trategy when the attacker chooe path A j i given by Equation (11), where T (t j) i the attacker payoff for reaching t j. U d (, A j) = T (t j) ( i (1 z ij) i) (11) The dual variable correponding to Inequality (7) give the attacker mied trategy a, defined over A. The epected utility for the attacker i given by U d. 5.3 Defender Oracle Thi ection concern the bet-repone oracle problem for the defender. The Defender Oracle problem i tated a follow: generate the defender pure trategy (reource allocation) Λ allocating k reource over the edge E that maimize the defender epected utility againt a given attacker mied trategy a over path A. Defender Oracle problem i NP-hard: We how thi by reducing the et cover problem to it. The Set-Cover problem: Given are a et U, a collection S of ubet of U (that i, S 2 U ), and an integer k. The quetion i whether there i a cover C S of ize k or le, that i, c C c = U and C k. We will ue a modification of thi well-known NP-hard problem o that S alway contain all ingleton ubet of U, that i, U implie {} S. Thi modified problem remain NP-hard. THEOREM 1. The Defender Oracle problem i NP-hard, even if there i only a ingle ource and a ingle target. PROOF. Reduction from Set-Cover to Defender Oracle: We convert an arbitrary intance of the et cover problem to an intance of the defender oracle problem by contructing a graph G with jut 3 node, a hown in Figure 3. The graph G i a multi-graph 5 with jut three node, o that N = {, v, t}, where i the only ource and t i the only target (with arbitrary poitive value). There are up to S loop edge adjacent to node v; each loop edge correpond to a unique non-ingleton ubet in S. There are U edge between and v, each correponding to a unique element in U. There are alo U edge between v and t, each correponding to a unique element in U. The attacker path correpond to the element in U. A path that correpond to u U tart with the edge between and v that correpond to u, then loop through all the edge that correpond to non-ingleton ubet in S that contain u, and finally end with the edge between v and t that correpond to u. Hence, any two path ued by the attacker can only interect at the loop edge. The probabilitie that the defender place on thee path are arbitrary poitive number. We now how that et U can be covered with k ubet in S 2 U if and only if the defender can block all of the attacker path with k reource in the correponding defender oracle problem intance. 5 Having a multi-graph i not eential to the NP-hardne reduction. 330

e 1 e 1 ' v e 2 ' t e 3 ' e 2 e 3 e 1,2 e 1,3 Figure 3: A defender oracle problem intance correponding to the SET-COVER intance with U = {1, 2, 3}, S = {{1}, {2}, {3}, {1, 2}, {1, 3}}. Here, the attacker mied trategy ue three path: (e 1, e 1,2, e 1,3, e 1), (e 2, e 1,2, e 2), (e 3, e 1,3, e 3). Thu, the SET-COVER intance ha a olution of ize 2 (for eample, uing {1, 2} and {1, 3}); correpondingly, with 2 reource, the defender can alway capture the attacker (for eample, by covering e 1,2, e 1,3). The if direction: If the defender can block all the path ued by the attacker with k reource, then the et U can be covered with C S, where C = k and i contructed a follow. If the defender place a reource on a loop edge, then C include the non-ingleton ubet in S that correpond to that loop edge. If the defender block any other edge then C include the correponding ingleton ubet. The only if direction: If there eit a cover C of ize k, then the defender can block all the path by placing a defenive reource on every loop edge that correpond to a non-ingleton ubet in C, and placing a defenive reource on the correponding edge out of for every ingleton ubet in C. Formulation: The defender oracle problem, decribed below, can be formulated a a mied integer linear program (MILP). The objective of the MILP i to identify the allocation that cover a many attacker path a poible, where path are weighted by the product of the payoff of the target attacked by the path and probability of attacker chooing it. (In thi formulation, probabilitie a j are not variable; they are provided by CoreLP.) In the formulation, λ e = 1 indicate that we aign a reource to edge e, and z j = 1 indicate that path A j (refer Table 1) i blocked by the allocation. ma z,λ j (1 zj)ajtt j (12).t. z j A jeλ e (13) e e λe k (14) λ e {0, 1} (15) z j [0, 1] (16) THEOREM 2. The MILP decribed above correctly compute a bet-repone allocation for the defender. PROOF. The defender receive a payoff of T (t j)a j if the attacker uccefully attack target t j uing path A j, and 0 in the cae of an unucceful attack. Hence, if we make ure that 1 z j = 1 if path A j i not blocked, and 0 otherwie, then the objective function (12) correctly model the defender epected utility. Inequality (13) enure thi: it right-hand ide will be at leat 1 if there eit an edge on the path A j that defender i covering, and 0 otherwie. z j need not be retricted to take an integer value becaue the objective i increaing with z j and if the olver can puh it above 0, it will chooe to puh it all the way up to 1. Therefore, if we let Λ correpond to the et of edge covered by the defender, z j will be et by the olver o that: { 1 if Λ Aj e λ z j = e = A je = 1 (17) 0 otherwie Inequality (14) enforce that the defender cover at mot a many edge a the number of available reource k, and thu enure feaibility. Hence, the above MILP correctly capture the betrepone oracle problem for the defender. PROPOSITION 1. For any attacker mied trategy, the defender epected utility from the bet repone provided by the defender oracle i no wore than the defender equilibrium utility in the full zero-um game. PROOF. In any equilibrium, the attacker play a mied trategy that minimize the defender bet-repone utility; therefore, if the attacker play any other mied trategy, the defender betrepone utility can be no wore. 5.4 Attacker Oracle Thi ection concern the bet-repone oracle problem for the attacker. The Attacker Oracle problem i to generate the attacker pure trategy (path) Γ from ome ource S to ome target t T that maimize the attacker epected utility given the defender mied trategy over defender allocation X. Attacker Oracle i NP-hard: We how that the attacker oracle problem i alo NP-hard by reducing 3-SAT to it. THEOREM 3. The Attacker Oracle problem i NP-hard, even if there i only a ingle ource and a ingle target. PROOF. Reduction from 3-SAT to Attacker Oracle: We convert an arbitrary intance of 3-SAT to an intance of the attacker oracle problem a follow. Suppoe the 3-SAT intance contain n variable i, i = 1,..., n, and k claue. Each claue i a dijunction of three literal, where each literal i either a variable or the negation of the variable. Conider the following eample: E = ( 1 2 3) ( 1 2 4) (18) The formula E contain n = 4 variable and k = 2 claue. We contruct a multi-graph G 6 with n+k+1 node, v 0,..., v n+k o that the ource node i = v 0, and the target node i t = v n+k. Every edge connect ome pair of node with conecutive indice, o that every imple path from to t contain eactly n + k edge. Each edge correpond to a literal in the 3-SAT epreion (that i, either i or i). There are eactly three edge that connect node v i 1 and v i for i = 1,..., k. Thoe three edge correpond to the three literal in the i-th claue. There are eactly two edge that connect node v k+j 1 and v k+j for j = 1,..., n. Thoe two edge correpond to literal j and j. An eample graph that correpond to the epreion (18) i hown in Figure 4. 1 1 1 2 3 4 2 2 1 2 3 4 =v 3 4 0 v 1 v 2 v 3 v 4 v 5 v 6 =t Figure 4: An eample graph correponding to the CNF formula ( 1 2 3) ( 1 2 4) There are 2n defender pure trategie (allocation of reource), each played with equal probability of 1/(2n). Each defender pure trategy correpond to a literal, and the edge that correpond to that literal are blocked in that pure trategy. In the eample hown in Figure 4, the defender play 8 pure trategie, each with probability 1/8. Three edge are blocked in the pure trategy that correpond to the literal 1 (namely, the top edge between v 0 and v 1, 6 We ue a multi-graph for implicity; having a multi-graph i not eential for the NP-hardne reduction. 331

the top edge between v1 and v2, and the top edge between v2 and v3 ); only one edge i blocked in the pure trategy that correpond to the literal 4 (the bottom edge between v5 and v6 ). (If it i deired that the defender alway ue the ame number of reource, thi i eaily achieved by adding dummy edge.) We now how that there i an aignment of value to the variable in the 3-SAT intance o that the formula evaluate to true if and only if there i a path from to t in the correponding attacker oracle problem intance which i blocked with probability at mot 1/2. The if direction: Suppoe there i a path Γ from to t that i blocked with probability at mot 1/2. Note that any path from to t i blocked by at leat one of the trategie {i, i }, for all i = 1,..., n, o the probability that the path i blocked i at leat n/(2n) = 1/2. Moreover, if for ome i, the path pae through both an edge labeled i and one labeled i, then the probability that the path i blocked i at leat (n + 1)/(2n) > 1/2 o thi cannot be the cae for Γ. Hence, we can aign the true value to the literal that correpond to the edge on the path Γ, and fale to all the other literal. Thi mut correpond to a olution to the 3-SAT intance, becaue each claue mut contain a literal that correpond to an edge on the path, and i thu aigned a true value. The only if direction: Suppoe there i an aignment of value to the variable uch that the 3-SAT formula evaluate to true. Conider a imple path Γ that goe from to t through edge that correpond to literal with true value in the aignment. Such a path mut eit becaue by aumption the aignment atifie every claue. Moreover, thi path i blocked only by the defender trategie that correpond to true literal, of which there are eactly n. So the probability that the path i blocked i n/(2n) = 1/2. allocation. Hence, if we make ure that 1 zi = 1 if allocation Xi doe not block the path, and 0 otherwie, then the objective function (19) correctly model the attacker epected utility. Inequality (23) enure thi: if the allocation Xi cover ome e for which γe = 1, then it will force zi to be et at leat to 1; otherwie, zi only need to be et to at leat 0 (and in each cae, the olver will puh it all the way down to thi value, which alo eplain why the zi variable do not need to be retricted to take integer value). Therefore, if we let Γ correpond to the path choen by the attacker, zi will be et by the olver o that 1 if Xi Γ 6= e γe = Xie = 1 zi = (26) 0 otherwie It follow that the MILP objective i correct. Hence, the above MILP capture the bet-repone oracle problem for the attacker. P ROPOSITION 2. For any defender mied trategy, the attacker epected utility from the bet repone provided by the attacker oracle i no wore than the attacker equilibrium utility in the full zero-um game. P ROOF. In any equilibrium, the defender play a mied trategy that minimize the attacker bet-repone utility; therefore, if the defender play any other mied trategy, the attacker betrepone utility can be no wore. Formulation: The attacker oracle problem can be formulated a a et of mied integer linear program, a decribed below. For every target in T, we olve for the bet path to that target; then we take the bet olution overall. Below i the formulation when the attacker i attacking target tm. (In thi formulation, probabilitie i are not variable; they are value produced earlier by CoreLP.) In the formulation, γe = 1 indicate that the attacker pae through edge e, and zi = 1 indicate that the allocation Xi block the attacker path. Equation (20) to (22) repreent the flow contraint for the attacker for every node n N. P (19) ma Ttm i i (1 zi ) z,γ X P.t. = γe n 6=, tm (20) e out(n) γe Figure 5: Eample graph of Southern Mumbai with 455 node. Source are depicted a green arrow and target are red bulleye. Bet viewed in color. e in(n) P e out() γe =1 (21) P γe =1 (22) e in(tm ) zi zi γe + Xie 1 0 γe {0, 1} e i 6. (23) (24) EVALUATION In thi ection, we decribe the reult we achieved with RUGGED. We conducted eperiment on graph obtained from road network GIS data for the city of Mumbai (inpired by the 2008 Mumbai incident [5]), a well a on artificially generated graph. We provide two type of reult: (1) Firtly, we compare the olution quality obtained from RUGGED with the olution quality obtained from R ANGER. Thee reult are hown in Section 6.1. (2) Secondly, we provide runtime reult howing the performance of RUGGED when the input graph are caled up.7 The following three type of graph were ued for the eperimental reult: (1) Weakly fully connected (WFC) graph, denoted GWFC (N, E), are graph where N i an ordered et of node {n1,..., nm ; S = {n1 }, T = {nm }}. For each node ni, there eit a et of directed (25) T HEOREM 4. The MILP decribed above correctly compute a bet-repone path for the attacker. P ROOF. The flow contrain are repreented in Equation (20) to (22). The ink for the flow i the target tm that we are currently conidering for attack. To deal with the cae where there i more than one poible ource node, we can add a virtual ource () to G that feed into all the real ource. in(n) repreent the edge coming into n, out(n) repreent thoe going out of n. The flow contraint enure that the choen edge indeed contitute a path from the (virtual) ource to the ink. The attacker receive a payoff of T (tm ) if he attack target tm uccefully, that i, if the path doe not interect with any defender 7 All eperiment were run on tandard dektop 2.8GHz machine with 2GB main memory. 332

edge, {(n i, n j) n i < n j}, in E. Thee graph were choen becaue of the etreme ize of the trategy pace for both player. Additionally, there are no bottleneck edge, o thee graph are deigned to be computationally challenging for RUGGED. (2) Braid-type graph, denoted G B (N, E), are graph where N i a equence of node n 1 to n m uch that each pair n i 1 and n i i connected by 2 to 3 edge. Node n 1 i the ource node. Any following node i a target node with probability 0.2, with payoff T randomly choen between 1 and 100. Thee graph have a imilar tructure a the graph in Figure 1, and were motivated by the countereample in Section 4. (3) City graph of different ize were etracted from the outhern part of Mumbai uing the GIS data provided by OpenStreet- Map. The placement of 2-4 target wa inpired by the Mumbai incident from 2008 [5]; 2-4 ource were placed on the border of the graph, 8 imulating an attacker approaching from the ea. We ran the tet for graph with the following number of node: 45, 129 and 252. Figure 5 how a ample Mumbai graph with 252 node, 4 ource and 3 target. 6.1 Comparion with RANGER Thi ection compare the olution quality of RUGGED and RAN- GER. Although we have already etablihed that RANGER olution can be arbitrarily bad in general, the objective of thee tet i to compare the actual performance of RANGER with RUGGED. The reult are given in Table 2, which how the average and maimum error from RANGER. We evaluated RANGER on the three type of graph city graph, braid graph and weakly fully connected graph of different ize, fiing the number of defender reource to 2 and placing 3 target, with varied value from the interval [0, 1000]. The actual defender utility from the olution provided by RANGER 9 i computed by uing the bet-repone oracle for the attacker with the RANGER defender trategy a input. The error of RANGER i then epreed a the difference between the defender utilitie in the olution provided by RANGER and by RUGGED. Table 2 how the comparion reult between RANGER and RUGGED, ummarized over 30 trial. It how the percentage of trial in which RANGER gave an incorrect olution (denoted pct ). It alo how the average and maimum error of RANGER (denoted a avg and ma repectively) over thee trial. It how that while RANGER wa wrong only about 1/3 of the time for Braid graph, it gave the wrong anwer in all the run on the fully connected graph. Furthermore, it wa wrong 90% of the time on city graph, with an average error of 215 unit and a maimum error of 721 unit. Given an average target value of 500, thee are high error indeed indicating that RANGER i unuitable for deployment in real-world domain. 6.2 Scale-up and analyi Thi ection concern the performance of RUGGED when the input problem intance are caled up. The eperiment were conducted on graph derived directly from portion of Mumbai road network. The runtime reult are hown in Table 3, where the row repreent the ize of the graph and the column repreent the number of defender reource that need to be cheduled. A an eample of the compleity of the graph, the number of attacker path in the Mumbai graph with 252 node i at leat a 10 12, while the number of defender allocation i approimately 10 10 for 4 reource. 8 We placed more ource and target into larger graph. 9 Becaue RANGER provide a olution in the form of marginal probabilitie of defender allocation along edge, we ued Comb ampling [15] to convert thi into a (joint) probability ditribution over defender allocation. City Braid WFC node 45 129 10 20 10 20 avg error 215 250 210 259 191 80 ma error 721 489 472 599 273 117 pct 90% 100% 30% 37% 100% 100% avg T 500 500 500 500 500 500 Table 2: RANGER average and maimum error and percent of ample where RANGER provided a uboptimal olution. Target value T were randomly drawn from the interval [1, 1000]. 1 2 3 4 45 0.91 6.43 22.58 33.42 129 6.63 32.55 486.48 3140.23 252 17.19 626.25 2014.14 34344.70 Table 3: Runtime (in econd) of RUGGED when the input problem intance are caled up. Thee tet were done on graph etracted from the road network of Mumbai. The row correpond to the number of node in the graph wherea the column correpond to the number of defender reource. The game matri for thi problem cannot even be repreented, let alone olved. The ability of RUGGED to compute optimal olution in uch ituation, while overcoming NP-hardne of both oracle, mark a ignificant advance in the tate of the art in deploying game-theoretic technique. Figure 6(a) eamine the performance of RUGGED when the ize of the trategy pace for both player i increaed. Thee tet were conducted on WFC graph, ince they are deigned to have large trategy pace. Thee problem have 20 to 100 node and up to 5 reource. The -ai in the figure how the number of node in the graph, while the y-ai how the runtime in econd. Different number of defender reource are repreented by different curve in the graph. For eample for 40 node, and 5 defender reource, RUGGED took 108 econd on average. To peed up the convergence of RUGGED, we tried to warm-tart the algorithm with an initial defender allocation uch a min-cutbaed allocation, target- and ource-centric allocation, RANGER allocation and combination of thee. No ignificant improvement of runtime wa meaured; in ome cae, the runtime increaed becaue of the larger trategy et for the defender. 6.3 Algorithm Dynamic Analyi Thi ection analyze the anytime olution quality and the performance of each of the three component of RUGGED: the defender oracle, the attacker oracle, and the CoreLP. When we olve the bet-repone oracle problem, they provide lower and upper bound on the optimal defender utility, a hown in Propoition 1 and 2. Figure 6(b) how the progre of the bound and the CoreLP olution for a ample problem intance cheduling 2 defender reource on a fully connected network with 50 node. The -ai how the number of iteration and the y-ai how the epected defender utility. The graph how that a good olution (i.e., one where the difference in the two bound i le than ɛ) can be computed reaonably quickly, even though the algorithm take longer to converge to the optimal olution. For eample, a olution with an allowed approimation of 10 unit 10 can be computed in about 210 iteration, wherea 310 iteration are required to find the optimal olution. The difference between thee two bound give an upper 10 10 unit i 1% of the maimum target payoff (1000). 333

(a) (b) (c) Figure 6: Reult. Figure (a) how the cale-up analyi on WFC graph of different ize. Figure (b) how the convergence of oracle value to the final game value and the anytime bound. Figure (c) compare the runtime of oracle and the core LP. bound on the error in the current olution of the CoreLP; thi alo provide u with an approimation variant of RUGGED. Figure 6(c) compare the runtime needed by the three module in every iteration. The -ai how the iteration number and the y-ai how the runtime in econd in logarithmic cale. A epected, CoreLP olving a tandard linear program need coniderably le time in each iteration than both the oracle, which olve mied-integer program. The figure alo how that the module cale well a the number of iteration increae. 7. CONCLUSION AND FUTURE WORK Optimally cheduling defender reource in a network-baed environment i an important and challenging problem. Security in urban road network, computer network, and other tranportation network i of growing concern, requiring the development of novel calable approache. Thee domain have etremely large trategy pace; a graph with jut 20 node and 5 reource can have more than 2 billion trategie for both player. In thi paper, we preented RUGGED, a novel double-oracle baed approach for finding an optimal trategy for cheduling a limited number of defender reource in a network ecurity environment. We howed that previou approache can lead to arbitrarily bad olution in uch ituation, and the error can be very high even in practice. We applied RUGGED to real-city map generated from GIS data; we preented the reult of applying RUGGED to the road network of Mumbai. While enhancement to RUGGED are required for deployment in ome real-world domain, optimal olution even to thee problem are now within reach. The calability of RUGGED open up new avenue for deploying game-theoretic technique in real-world application. 8. ACKNOWLEDGEMENTS Thi reearch i upported by the United State Department of Homeland Security through Center for Rik and Economic Analyi of Terrorim Event (CREATE), the Czech Minitry of Education, Youth and Sport under project number N00014-09-1-0537, the NSF CAREER grant 0953756 and IIS-0812113, ARO 56698- CI, and an Alfred P. Sloan fellowhip. We thank Ron Parr, Michal Jakob and Zhengyu Yin for comment and dicuion. 9. REFERENCES [1] M. Adler, H. Räcke, N. Sivadaan, C. Sohler, and B. Vöcking. Randomized puruit-evaion in graph. In ICALP, page 901 912, 2002. [2] S. Alpern. Infiltration Game on Arbitrary Graph. Journal of Mathematical Analyi and Application, 163:286 288, 1992. [3] Y. Bachrach and E. Porat. Path Diruption Game. In AAMAS, page 1123 1130, 2010. [4] N. Bailico, N. Gatti, and F. Amigoni. Leader-Follower Strategie for Robotic Patrolling in Environment with Arbitrary Topologie. In AAMAS, page 500 503, 2009. [5] R. Chandran and G. Beitchman. Battle for Mumbai End, Death Toll Rie to 195. Time of India, 29 November 2008. [6] J. Dickeron, G. Simari, V. Subrahmanian, and S. Krau. A Graph-Theoretic Approach to Protect Static and Moving Target from Adverarie. In AAMAS, page 299 306, 2010. [7] M. M. Flood. The Hide and Seek Game of Von Neumann. MANAGEMENT SCIENCE, 18(5-Part-2):107 109, 1972. [8] S. Gal. Search Game. Academic Pre, New York, 1980. [9] E. Halvoron, V. Conitzer, and R. Parr. Multi-tep Multi-enor Hider-Seeker Game. In IJCAI, page 159 166, 2009. [10] M. Jain, E. Karde, C. Kiekintveld, F. Ordóñez, and M. Tambe. Security Game with Arbitrary Schedule: A Branch and Price Approach. In AAAI, page 792 797, 2010. [11] H. B. McMahan, G. J. Gordon, and A. Blum. Planning in the Preence of Cot Function Controlled by an Adverary. In ICML, page 536 543, 2003. [12] J. V. Neumann. Zur Theorie der Geellchaftpiele. Mathematiche Annalen, 100(1):295 320, 1928. [13] J. Pita, M. Jain, F. Ordóñez, C. Portway, M. Tambe, C. Wetern, P. Paruchuri, and S. Krau. Uing Game Theory for Lo Angele Airport Security. AI Magazine, 30(1), 2009. [14] W. Ruckle, R. Fennell, P. T. Holme, and C. Fennemore. Ambuhing Random Walk I: Finite Model. Operation Reearch, 24:314 324, 1976. [15] J. Tai, Z. Yin, J. young Kwak, D. Kempe, C. Kiekintveld, and M. Tambe. Urban ecurity: Game-theoretic reource allocation in networked phyical domain. In AAAI, page 881 886, 2010. [16] O. Vaněk, B. Bošanký, M. Jakob, and M. Pěchouček. Traniting Area Patrolled by a Mobile Adverary. In IEEE CIG, page 9 16, 2010. [17] A. Wahburn and K. Wood. Two-peron Zero-um Game for Network Interdiction. Operation Reearch, 43(2):243 251, 1995. [18] Z. Yin, D. Korzhyk, C. Kiekintveld, V. Conitzer, and M. Tambe. Stackelberg v. Nah in Security Game: Interchangeability, Equivalence, and Uniquene. In AAMAS, page 1139 1146, 2010. 334