A Bayesian reinforcement learning approach in markov games for computing near-optimal policies

Julio B. Clempner

doi:10.1007/s10472-023-09860-3

A Bayesian reinforcement learning approach in markov games for computing near-optimal policies

Julio B. Clempner

Escuela Superior de Física y Matemáticas (ESFM)

Producción científica: Contribución a una revista › Artículo › revisión exhaustiva

1 Cita (Scopus)

Resumen

Bayesian Learning is an inference method designed to tackle exploration-exploitation trade-off as a function of the uncertainty of a given probability model from observations within the Reinforcement Learning (RL) paradigm. It allows the incorporation of prior knowledge, as probabilistic distributions, into the algorithms. Finding the resulting Bayes-optimal policies is notorious problem. We focus our attention on RL of a special kind of ergodic and controllable Markov games. We propose a new framework for computing the near-optimal policies for each agent, where it is assumed that the Markov chains are regular and the inverse of the behavior strategy is well defined. A fundamental result of this paper is the development of a theoretical method that, based on the formulation of a non-linear problem, computes the near-optimal adaptive-behavior strategies and policies of the game under some restrictions that maximize the expected reward. We prove that such behavior strategies and the policies satisfy the Bayesian-Nash equilibrium. Another important result is that the RL process learn a model through the interaction of the agents with the environment, and shows how the proposed method can finitely approximate and estimate the elements of the transition matrices and utilities maintaining an efficient long-term learning performance measure. We develop the algorithm for implementing this model. A numerical empirical example shows how to deploy the estimation process as a function of agent experiences.

Idioma original	Inglés
Páginas (desde-hasta)	675-690
Número de páginas	16
Publicación	Annals of Mathematics and Artificial Intelligence
Volumen	91
N.º	5
DOI	https://doi.org/10.1007/s10472-023-09860-3
Estado	Publicada - oct. 2023

Acceder al documento

10.1007/s10472-023-09860-3

Otros archivos y enlaces

Enlace a la publicación en Scopus

Citar esto

@article{e97bc6dc812642a5a67234e4c941908b,

title = "A Bayesian reinforcement learning approach in markov games for computing near-optimal policies",

abstract = "Bayesian Learning is an inference method designed to tackle exploration-exploitation trade-off as a function of the uncertainty of a given probability model from observations within the Reinforcement Learning (RL) paradigm. It allows the incorporation of prior knowledge, as probabilistic distributions, into the algorithms. Finding the resulting Bayes-optimal policies is notorious problem. We focus our attention on RL of a special kind of ergodic and controllable Markov games. We propose a new framework for computing the near-optimal policies for each agent, where it is assumed that the Markov chains are regular and the inverse of the behavior strategy is well defined. A fundamental result of this paper is the development of a theoretical method that, based on the formulation of a non-linear problem, computes the near-optimal adaptive-behavior strategies and policies of the game under some restrictions that maximize the expected reward. We prove that such behavior strategies and the policies satisfy the Bayesian-Nash equilibrium. Another important result is that the RL process learn a model through the interaction of the agents with the environment, and shows how the proposed method can finitely approximate and estimate the elements of the transition matrices and utilities maintaining an efficient long-term learning performance measure. We develop the algorithm for implementing this model. A numerical empirical example shows how to deploy the estimation process as a function of agent experiences.",

keywords = "Bayesian equilibrium, Bayesian inference, Markov games with private information, Reinforcement learning",

author = "Clempner, {Julio B.}",

note = "Publisher Copyright: {\textcopyright} 2023, The Author(s), under exclusive licence to Springer Nature Switzerland AG.",

year = "2023",

month = oct,

doi = "10.1007/s10472-023-09860-3",

language = "Ingl{\'e}s",

volume = "91",

pages = "675--690",

journal = "Annals of Mathematics and Artificial Intelligence",

issn = "1012-2443",

publisher = "Springer Netherlands",

number = "5",

}

TY - JOUR

T1 - A Bayesian reinforcement learning approach in markov games for computing near-optimal policies

AU - Clempner, Julio B.

PY - 2023/10

Y1 - 2023/10

N2 - Bayesian Learning is an inference method designed to tackle exploration-exploitation trade-off as a function of the uncertainty of a given probability model from observations within the Reinforcement Learning (RL) paradigm. It allows the incorporation of prior knowledge, as probabilistic distributions, into the algorithms. Finding the resulting Bayes-optimal policies is notorious problem. We focus our attention on RL of a special kind of ergodic and controllable Markov games. We propose a new framework for computing the near-optimal policies for each agent, where it is assumed that the Markov chains are regular and the inverse of the behavior strategy is well defined. A fundamental result of this paper is the development of a theoretical method that, based on the formulation of a non-linear problem, computes the near-optimal adaptive-behavior strategies and policies of the game under some restrictions that maximize the expected reward. We prove that such behavior strategies and the policies satisfy the Bayesian-Nash equilibrium. Another important result is that the RL process learn a model through the interaction of the agents with the environment, and shows how the proposed method can finitely approximate and estimate the elements of the transition matrices and utilities maintaining an efficient long-term learning performance measure. We develop the algorithm for implementing this model. A numerical empirical example shows how to deploy the estimation process as a function of agent experiences.

AB - Bayesian Learning is an inference method designed to tackle exploration-exploitation trade-off as a function of the uncertainty of a given probability model from observations within the Reinforcement Learning (RL) paradigm. It allows the incorporation of prior knowledge, as probabilistic distributions, into the algorithms. Finding the resulting Bayes-optimal policies is notorious problem. We focus our attention on RL of a special kind of ergodic and controllable Markov games. We propose a new framework for computing the near-optimal policies for each agent, where it is assumed that the Markov chains are regular and the inverse of the behavior strategy is well defined. A fundamental result of this paper is the development of a theoretical method that, based on the formulation of a non-linear problem, computes the near-optimal adaptive-behavior strategies and policies of the game under some restrictions that maximize the expected reward. We prove that such behavior strategies and the policies satisfy the Bayesian-Nash equilibrium. Another important result is that the RL process learn a model through the interaction of the agents with the environment, and shows how the proposed method can finitely approximate and estimate the elements of the transition matrices and utilities maintaining an efficient long-term learning performance measure. We develop the algorithm for implementing this model. A numerical empirical example shows how to deploy the estimation process as a function of agent experiences.

KW - Bayesian equilibrium

KW - Bayesian inference

KW - Markov games with private information

KW - Reinforcement learning

UR - http://www.scopus.com/inward/record.url?scp=85161362796&partnerID=8YFLogxK

U2 - 10.1007/s10472-023-09860-3

DO - 10.1007/s10472-023-09860-3

M3 - Artículo

AN - SCOPUS:85161362796

SN - 1012-2443

VL - 91

SP - 675

EP - 690

JO - Annals of Mathematics and Artificial Intelligence

JF - Annals of Mathematics and Artificial Intelligence

IS - 5

ER -

A Bayesian reinforcement learning approach in markov games for computing near-optimal policies

Resumen

Acceder al documento

Otros archivos y enlaces

Huella

Citar esto