A Lyapunov approach for stable reinforcement learning

Julio B. Clempner

doi:10.1007/s40314-022-01988-y

A Lyapunov approach for stable reinforcement learning

Título traducido de la contribución: Un enfoque de Lyapunov para el aprendizaje por refuerzo estable

Julio B. Clempner

Escuela Superior de Física y Matemáticas (ESFM)

Producción científica: Contribución a una revista › Artículo › revisión exhaustiva

2 Citas (Scopus)

Resumen

Our strategy is based on a novel reinforcement-learning (RL) Lyapunov methodology. We propose a method for constructing Lyapunov-like functions using a feed-forward Markov decision process. These functions are important for assuring the stability of a behavior policy throughout the learning process. We show that the cost sequence, which corresponds to the best approach, is frequently non-monotonic, implying that convergence cannot be guaranteed. For any Markov-ergodic process, our technique generates a Lyapunov-like function, implying an one-to-one correspondence between the present cost-function and the suggested function, resulting in a monotonically non-increase behavior on the trajectories under optimum strategy realization. We show that the system’s dynamics and trajectory converge. We show how to employ the Lyapunov technique to solve RL problems. We explain how to employ the Lyapunov method to RL. We test the proposed approach to demonstrate its efficacy.

Título traducido de la contribución	Un enfoque de Lyapunov para el aprendizaje por refuerzo estable
Idioma original	Inglés
Número de artículo	279
Publicación	Computational and Applied Mathematics
Volumen	41
N.º	6
DOI	https://doi.org/10.1007/s40314-022-01988-y
Estado	Publicada - sep. 2022

Acceder al documento

10.1007/s40314-022-01988-y

Otros archivos y enlaces

Enlace a la publicación en Scopus

Citar esto

@article{577d17601a564c56ae7f77049cdc6934,

title = "A Lyapunov approach for stable reinforcement learning",

abstract = "Our strategy is based on a novel reinforcement-learning (RL) Lyapunov methodology. We propose a method for constructing Lyapunov-like functions using a feed-forward Markov decision process. These functions are important for assuring the stability of a behavior policy throughout the learning process. We show that the cost sequence, which corresponds to the best approach, is frequently non-monotonic, implying that convergence cannot be guaranteed. For any Markov-ergodic process, our technique generates a Lyapunov-like function, implying an one-to-one correspondence between the present cost-function and the suggested function, resulting in a monotonically non-increase behavior on the trajectories under optimum strategy realization. We show that the system{\textquoteright}s dynamics and trajectory converge. We show how to employ the Lyapunov technique to solve RL problems. We explain how to employ the Lyapunov method to RL. We test the proposed approach to demonstrate its efficacy.",

keywords = "Architecture, Average cost, Lyapunov, Markov chains, Optimization, Reinforcement learning",

author = "Clempner, {Julio B.}",

note = "Publisher Copyright: {\textcopyright} 2022, The Author(s) under exclusive licence to Sociedade Brasileira de Matem{\'a}tica Aplicada e Computacional.",

year = "2022",

month = sep,

doi = "10.1007/s40314-022-01988-y",

language = "Ingl{\'e}s",

volume = "41",

journal = "Computational and Applied Mathematics",

issn = "2238-3603",

publisher = "Birkhauser Boston",

number = "6",

}

TY - JOUR

T1 - A Lyapunov approach for stable reinforcement learning

AU - Clempner, Julio B.

PY - 2022/9

Y1 - 2022/9

N2 - Our strategy is based on a novel reinforcement-learning (RL) Lyapunov methodology. We propose a method for constructing Lyapunov-like functions using a feed-forward Markov decision process. These functions are important for assuring the stability of a behavior policy throughout the learning process. We show that the cost sequence, which corresponds to the best approach, is frequently non-monotonic, implying that convergence cannot be guaranteed. For any Markov-ergodic process, our technique generates a Lyapunov-like function, implying an one-to-one correspondence between the present cost-function and the suggested function, resulting in a monotonically non-increase behavior on the trajectories under optimum strategy realization. We show that the system’s dynamics and trajectory converge. We show how to employ the Lyapunov technique to solve RL problems. We explain how to employ the Lyapunov method to RL. We test the proposed approach to demonstrate its efficacy.

AB - Our strategy is based on a novel reinforcement-learning (RL) Lyapunov methodology. We propose a method for constructing Lyapunov-like functions using a feed-forward Markov decision process. These functions are important for assuring the stability of a behavior policy throughout the learning process. We show that the cost sequence, which corresponds to the best approach, is frequently non-monotonic, implying that convergence cannot be guaranteed. For any Markov-ergodic process, our technique generates a Lyapunov-like function, implying an one-to-one correspondence between the present cost-function and the suggested function, resulting in a monotonically non-increase behavior on the trajectories under optimum strategy realization. We show that the system’s dynamics and trajectory converge. We show how to employ the Lyapunov technique to solve RL problems. We explain how to employ the Lyapunov method to RL. We test the proposed approach to demonstrate its efficacy.

KW - Architecture

KW - Average cost

KW - Lyapunov

KW - Markov chains

KW - Optimization

KW - Reinforcement learning

UR - http://www.scopus.com/inward/record.url?scp=85135809321&partnerID=8YFLogxK

U2 - 10.1007/s40314-022-01988-y

DO - 10.1007/s40314-022-01988-y

M3 - Artículo

AN - SCOPUS:85135809321

SN - 2238-3603

VL - 41

JO - Computational and Applied Mathematics

JF - Computational and Applied Mathematics

IS - 6

M1 - 279

ER -

A Lyapunov approach for stable reinforcement learning

Resumen

Acceder al documento

Otros archivos y enlaces

Huella

Citar esto