Learning attack-defense response in continuous-time discrete-states Stackelberg Security Markov games

Julio B. Clempner

doi:10.1080/0952813X.2022.2135615

Learning attack-defense response in continuous-time discrete-states Stackelberg Security Markov games

Julio B. Clempner

Escuela Superior de Física y Matemáticas (ESFM)

Research output: Contribution to journal › Article › peer-review

3 Scopus citations

Abstract

Researchers have become interested in security games in recent decades as a result of its successful application in real-world security issues. The security model is based on the Stackelberg Security Game (SSG), in which defenders (leaders) select a defensive strategy based on the optimal reaction of attackers (followers), who, at equilibrium, select the predicted assaulting strategy as a response. These applications, on the other hand, do not account for the time constraints posed by the game’s players’ journey time. Furthermore, players should be able to cope with dynamic settings in which their knowledge of the environment changes on a regular basis, allowing them to perform more effectively. This research proposes a security model based on a continuous-time Reinforcement Learning (RL) approach implemented using a temporal difference method that takes prior information into account to address these issues. We use a controlled, ergodic continuous-time Markov game to model the SSG. The game framework model assumes that all information is available. We calculate the number of transitions over a time interval divided by the entire value of the holding time to estimate the transition rates. The arithmetic mean of the observed cost of the individual players is used to estimate the cost for defenders and attackers. An iterated proximal/gradient approach is used to calculate the SSG equilibrium point. We offer a continuous-time random walk method for game implementation. In a numerical case relevant to rain-forest hazards, we analyze the performance of the suggested RL security solution and discuss the problems that should be considered in future.

Translated title of the contribution	Aprendizaje de la respuesta de ataque-defensa en juegos de Stackelberg Security Markov de tiempo continuo y estados discretos
Original language	English
Journal	Journal of Experimental and Theoretical Artificial Intelligence
DOIs	https://doi.org/10.1080/0952813X.2022.2135615
State	Published - 2022

Keywords

Markov chain
Security game
Stackelberg game
continuous-time
reinforcement learning

Access to Document

10.1080/0952813X.2022.2135615

Cite this

@article{268a9db8b8034244a636789379b8e119,

title = "Learning attack-defense response in continuous-time discrete-states Stackelberg Security Markov games",

abstract = "Researchers have become interested in security games in recent decades as a result of its successful application in real-world security issues. The security model is based on the Stackelberg Security Game (SSG), in which defenders (leaders) select a defensive strategy based on the optimal reaction of attackers (followers), who, at equilibrium, select the predicted assaulting strategy as a response. These applications, on the other hand, do not account for the time constraints posed by the game{\textquoteright}s players{\textquoteright} journey time. Furthermore, players should be able to cope with dynamic settings in which their knowledge of the environment changes on a regular basis, allowing them to perform more effectively. This research proposes a security model based on a continuous-time Reinforcement Learning (RL) approach implemented using a temporal difference method that takes prior information into account to address these issues. We use a controlled, ergodic continuous-time Markov game to model the SSG. The game framework model assumes that all information is available. We calculate the number of transitions over a time interval divided by the entire value of the holding time to estimate the transition rates. The arithmetic mean of the observed cost of the individual players is used to estimate the cost for defenders and attackers. An iterated proximal/gradient approach is used to calculate the SSG equilibrium point. We offer a continuous-time random walk method for game implementation. In a numerical case relevant to rain-forest hazards, we analyze the performance of the suggested RL security solution and discuss the problems that should be considered in future.",

keywords = "Markov chain, Security game, Stackelberg game, continuous-time, reinforcement learning",

author = "Clempner, {Julio B.}",

note = "Publisher Copyright: {\textcopyright} 2022 Informa UK Limited, trading as Taylor & Francis Group.",

year = "2022",

doi = "10.1080/0952813X.2022.2135615",

language = "Ingl{\'e}s",

journal = "Journal of Experimental and Theoretical Artificial Intelligence",

issn = "0952-813X",

}

TY - JOUR

T1 - Learning attack-defense response in continuous-time discrete-states Stackelberg Security Markov games

AU - Clempner, Julio B.

PY - 2022

Y1 - 2022

N2 - Researchers have become interested in security games in recent decades as a result of its successful application in real-world security issues. The security model is based on the Stackelberg Security Game (SSG), in which defenders (leaders) select a defensive strategy based on the optimal reaction of attackers (followers), who, at equilibrium, select the predicted assaulting strategy as a response. These applications, on the other hand, do not account for the time constraints posed by the game’s players’ journey time. Furthermore, players should be able to cope with dynamic settings in which their knowledge of the environment changes on a regular basis, allowing them to perform more effectively. This research proposes a security model based on a continuous-time Reinforcement Learning (RL) approach implemented using a temporal difference method that takes prior information into account to address these issues. We use a controlled, ergodic continuous-time Markov game to model the SSG. The game framework model assumes that all information is available. We calculate the number of transitions over a time interval divided by the entire value of the holding time to estimate the transition rates. The arithmetic mean of the observed cost of the individual players is used to estimate the cost for defenders and attackers. An iterated proximal/gradient approach is used to calculate the SSG equilibrium point. We offer a continuous-time random walk method for game implementation. In a numerical case relevant to rain-forest hazards, we analyze the performance of the suggested RL security solution and discuss the problems that should be considered in future.

AB - Researchers have become interested in security games in recent decades as a result of its successful application in real-world security issues. The security model is based on the Stackelberg Security Game (SSG), in which defenders (leaders) select a defensive strategy based on the optimal reaction of attackers (followers), who, at equilibrium, select the predicted assaulting strategy as a response. These applications, on the other hand, do not account for the time constraints posed by the game’s players’ journey time. Furthermore, players should be able to cope with dynamic settings in which their knowledge of the environment changes on a regular basis, allowing them to perform more effectively. This research proposes a security model based on a continuous-time Reinforcement Learning (RL) approach implemented using a temporal difference method that takes prior information into account to address these issues. We use a controlled, ergodic continuous-time Markov game to model the SSG. The game framework model assumes that all information is available. We calculate the number of transitions over a time interval divided by the entire value of the holding time to estimate the transition rates. The arithmetic mean of the observed cost of the individual players is used to estimate the cost for defenders and attackers. An iterated proximal/gradient approach is used to calculate the SSG equilibrium point. We offer a continuous-time random walk method for game implementation. In a numerical case relevant to rain-forest hazards, we analyze the performance of the suggested RL security solution and discuss the problems that should be considered in future.

KW - Markov chain

KW - Security game

KW - Stackelberg game

KW - continuous-time

KW - reinforcement learning

UR - http://www.scopus.com/inward/record.url?scp=85140374446&partnerID=8YFLogxK

U2 - 10.1080/0952813X.2022.2135615

DO - 10.1080/0952813X.2022.2135615

M3 - Artículo

AN - SCOPUS:85140374446

SN - 0952-813X

JO - Journal of Experimental and Theoretical Artificial Intelligence

JF - Journal of Experimental and Theoretical Artificial Intelligence

ER -

Learning attack-defense response in continuous-time discrete-states Stackelberg Security Markov games

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this