Abstract
This paper presents a novel approach for adapting attackers and defenders preferred patrolling strategies using reinforcement learning (RL) based-on average rewards in Stackelberg security games. We propose a framework that combines three different paradigms: prior knowledge, imitation and temporal-difference method. The overall RL architecture involves two highest components: the Adaptive Primary Learning architecture and the Actor–critic architecture. In this work we consider that defenders and attackers conforms coalitions in the Stackelberg security game, these are reached by computing the Strong Lp-Stackelberg/Nash equilibrium. We present a numerical example that validates the proposed RL approach measuring the benefits for security resource allocation.
Original language | English |
---|---|
Pages (from-to) | 35-54 |
Number of pages | 20 |
Journal | Journal of Computer and System Sciences |
Volume | 95 |
DOIs | |
State | Published - Aug 2018 |
Keywords
- Behavioral games
- Multiple players
- Reinforcement learning
- Security games
- Stackelberg games
- Strong Stackelberg/Nash equilibrium