Learning attack-defense response in continuous-time discrete-states Stackelberg Security Markov games

Julio B. Clempner

doi:10.1080/0952813X.2022.2135615

Learning attack-defense response in continuous-time discrete-states Stackelberg Security Markov games

Título traducido de la contribución: Aprendizaje de la respuesta de ataque-defensa en juegos de Stackelberg Security Markov de tiempo continuo y estados discretos

Julio B. Clempner

Escuela Superior de Física y Matemáticas (ESFM)

Producción científica: Contribución a una revista › Artículo › revisión exhaustiva

3 Citas (Scopus)

Resumen

Researchers have become interested in security games in recent decades as a result of its successful application in real-world security issues. The security model is based on the Stackelberg Security Game (SSG), in which defenders (leaders) select a defensive strategy based on the optimal reaction of attackers (followers), who, at equilibrium, select the predicted assaulting strategy as a response. These applications, on the other hand, do not account for the time constraints posed by the game’s players’ journey time. Furthermore, players should be able to cope with dynamic settings in which their knowledge of the environment changes on a regular basis, allowing them to perform more effectively. This research proposes a security model based on a continuous-time Reinforcement Learning (RL) approach implemented using a temporal difference method that takes prior information into account to address these issues. We use a controlled, ergodic continuous-time Markov game to model the SSG. The game framework model assumes that all information is available. We calculate the number of transitions over a time interval divided by the entire value of the holding time to estimate the transition rates. The arithmetic mean of the observed cost of the individual players is used to estimate the cost for defenders and attackers. An iterated proximal/gradient approach is used to calculate the SSG equilibrium point. We offer a continuous-time random walk method for game implementation. In a numerical case relevant to rain-forest hazards, we analyze the performance of the suggested RL security solution and discuss the problems that should be considered in future.

Título traducido de la contribución	Aprendizaje de la respuesta de ataque-defensa en juegos de Stackelberg Security Markov de tiempo continuo y estados discretos
Idioma original	Inglés
Publicación	Journal of Experimental and Theoretical Artificial Intelligence
DOI	https://doi.org/10.1080/0952813X.2022.2135615
Estado	Publicada - 2022

Acceder al documento

10.1080/0952813X.2022.2135615

Otros archivos y enlaces

Enlace a la publicación en Scopus

Citar esto

@article{268a9db8b8034244a636789379b8e119,

title = "Learning attack-defense response in continuous-time discrete-states Stackelberg Security Markov games",

abstract = "Researchers have become interested in security games in recent decades as a result of its successful application in real-world security issues. The security model is based on the Stackelberg Security Game (SSG), in which defenders (leaders) select a defensive strategy based on the optimal reaction of attackers (followers), who, at equilibrium, select the predicted assaulting strategy as a response. These applications, on the other hand, do not account for the time constraints posed by the game{\textquoteright}s players{\textquoteright} journey time. Furthermore, players should be able to cope with dynamic settings in which their knowledge of the environment changes on a regular basis, allowing them to perform more effectively. This research proposes a security model based on a continuous-time Reinforcement Learning (RL) approach implemented using a temporal difference method that takes prior information into account to address these issues. We use a controlled, ergodic continuous-time Markov game to model the SSG. The game framework model assumes that all information is available. We calculate the number of transitions over a time interval divided by the entire value of the holding time to estimate the transition rates. The arithmetic mean of the observed cost of the individual players is used to estimate the cost for defenders and attackers. An iterated proximal/gradient approach is used to calculate the SSG equilibrium point. We offer a continuous-time random walk method for game implementation. In a numerical case relevant to rain-forest hazards, we analyze the performance of the suggested RL security solution and discuss the problems that should be considered in future.",

keywords = "Markov chain, Security game, Stackelberg game, continuous-time, reinforcement learning",

author = "Clempner, {Julio B.}",

note = "Publisher Copyright: {\textcopyright} 2022 Informa UK Limited, trading as Taylor & Francis Group.",

year = "2022",

doi = "10.1080/0952813X.2022.2135615",

language = "Ingl{\'e}s",

journal = "Journal of Experimental and Theoretical Artificial Intelligence",

issn = "0952-813X",

}

TY - JOUR

T1 - Learning attack-defense response in continuous-time discrete-states Stackelberg Security Markov games

AU - Clempner, Julio B.

PY - 2022

Y1 - 2022

N2 - Researchers have become interested in security games in recent decades as a result of its successful application in real-world security issues. The security model is based on the Stackelberg Security Game (SSG), in which defenders (leaders) select a defensive strategy based on the optimal reaction of attackers (followers), who, at equilibrium, select the predicted assaulting strategy as a response. These applications, on the other hand, do not account for the time constraints posed by the game’s players’ journey time. Furthermore, players should be able to cope with dynamic settings in which their knowledge of the environment changes on a regular basis, allowing them to perform more effectively. This research proposes a security model based on a continuous-time Reinforcement Learning (RL) approach implemented using a temporal difference method that takes prior information into account to address these issues. We use a controlled, ergodic continuous-time Markov game to model the SSG. The game framework model assumes that all information is available. We calculate the number of transitions over a time interval divided by the entire value of the holding time to estimate the transition rates. The arithmetic mean of the observed cost of the individual players is used to estimate the cost for defenders and attackers. An iterated proximal/gradient approach is used to calculate the SSG equilibrium point. We offer a continuous-time random walk method for game implementation. In a numerical case relevant to rain-forest hazards, we analyze the performance of the suggested RL security solution and discuss the problems that should be considered in future.

AB - Researchers have become interested in security games in recent decades as a result of its successful application in real-world security issues. The security model is based on the Stackelberg Security Game (SSG), in which defenders (leaders) select a defensive strategy based on the optimal reaction of attackers (followers), who, at equilibrium, select the predicted assaulting strategy as a response. These applications, on the other hand, do not account for the time constraints posed by the game’s players’ journey time. Furthermore, players should be able to cope with dynamic settings in which their knowledge of the environment changes on a regular basis, allowing them to perform more effectively. This research proposes a security model based on a continuous-time Reinforcement Learning (RL) approach implemented using a temporal difference method that takes prior information into account to address these issues. We use a controlled, ergodic continuous-time Markov game to model the SSG. The game framework model assumes that all information is available. We calculate the number of transitions over a time interval divided by the entire value of the holding time to estimate the transition rates. The arithmetic mean of the observed cost of the individual players is used to estimate the cost for defenders and attackers. An iterated proximal/gradient approach is used to calculate the SSG equilibrium point. We offer a continuous-time random walk method for game implementation. In a numerical case relevant to rain-forest hazards, we analyze the performance of the suggested RL security solution and discuss the problems that should be considered in future.

KW - Markov chain

KW - Security game

KW - Stackelberg game

KW - continuous-time

KW - reinforcement learning

UR - http://www.scopus.com/inward/record.url?scp=85140374446&partnerID=8YFLogxK

U2 - 10.1080/0952813X.2022.2135615

DO - 10.1080/0952813X.2022.2135615

M3 - Artículo

AN - SCOPUS:85140374446

SN - 0952-813X

JO - Journal of Experimental and Theoretical Artificial Intelligence

JF - Journal of Experimental and Theoretical Artificial Intelligence

ER -

Learning attack-defense response in continuous-time discrete-states Stackelberg Security Markov games

Resumen

Acceder al documento

Otros archivos y enlaces

Huella

Citar esto