Continuous-time reinforcement learning approach for portfolio management with time penalization

Mauricio García-Galicia; Alin A. Carsteanu; Julio B. Clempner

doi:10.1016/j.eswa.2019.03.055

Continuous-time reinforcement learning approach for portfolio management with time penalization

Mauricio García-Galicia, Alin A. Carsteanu, Julio B. Clempner

Escuela Superior de Física y Matemáticas (ESFM)

Research output: Contribution to journal › Article › peer-review

27 Scopus citations

Abstract

This paper considers the problem of policy optimization in the context of continuous-time Reinforcement Learning (RL), a branch of artificial intelligence, for financial portfolio management purposes. The underlying asset portfolio process is assumed to possess a continuous-time discrete-state Markov chain structure involving the simplex and ergodicity constraints. The goal of the portfolio problem is the redistribution of a fund into different financial assets. One general assumption has to be set, namely that the market is arbitrage-free (no price arbitrage is possible) then the problem of how to obtain the optimal policy is solvable. We provide a RL solution based on an actor/critic architecture in which the market is characterized by a restriction called transaction cost, involving time penalization. The portfolio problem in Markov chains is determined by solving a convex quadratic minimization problem with linear constraints. Any Markov chain is generated by a stochastic transition matrices and the mathematical expectations of the rewards. In particular, we estimate the elements of the transition rate matrices and the mathematical expectations of the rewards. This method learns the optimal strategy in order to make a decision on what portfolio weight to take for a single period. With this strategy, the agent is able to choose the state with maximum utility and select its respective action. The optimal policy computation is solved employing a proximal optimization novel approach, which involves time penalization in the transaction costs and the rewards. We employ the Lagrange multipliers approach to include the restrictions of the market and those that are imposed by the continuous time frame. Moreover, a specific numerical example in baking, that fit into the general framework of portfolio, validates the effectiveness and usefulness of the proposed method.

Original language	English
Pages (from-to)	27-36
Number of pages	10
Journal	Expert Systems with Applications
Volume	129
DOIs	https://doi.org/10.1016/j.eswa.2019.03.055
State	Published - 1 Sep 2019

Keywords

Continuous-time
Markov chains
Portfolio
Reinforcement learning
Transaction costs

Access to Document

10.1016/j.eswa.2019.03.055

Cite this

@article{815fa541770a4b0a8e53500457293dd3,

title = "Continuous-time reinforcement learning approach for portfolio management with time penalization",

abstract = "This paper considers the problem of policy optimization in the context of continuous-time Reinforcement Learning (RL), a branch of artificial intelligence, for financial portfolio management purposes. The underlying asset portfolio process is assumed to possess a continuous-time discrete-state Markov chain structure involving the simplex and ergodicity constraints. The goal of the portfolio problem is the redistribution of a fund into different financial assets. One general assumption has to be set, namely that the market is arbitrage-free (no price arbitrage is possible) then the problem of how to obtain the optimal policy is solvable. We provide a RL solution based on an actor/critic architecture in which the market is characterized by a restriction called transaction cost, involving time penalization. The portfolio problem in Markov chains is determined by solving a convex quadratic minimization problem with linear constraints. Any Markov chain is generated by a stochastic transition matrices and the mathematical expectations of the rewards. In particular, we estimate the elements of the transition rate matrices and the mathematical expectations of the rewards. This method learns the optimal strategy in order to make a decision on what portfolio weight to take for a single period. With this strategy, the agent is able to choose the state with maximum utility and select its respective action. The optimal policy computation is solved employing a proximal optimization novel approach, which involves time penalization in the transaction costs and the rewards. We employ the Lagrange multipliers approach to include the restrictions of the market and those that are imposed by the continuous time frame. Moreover, a specific numerical example in baking, that fit into the general framework of portfolio, validates the effectiveness and usefulness of the proposed method.",

keywords = "Continuous-time, Markov chains, Portfolio, Reinforcement learning, Transaction costs",

author = "Mauricio Garc{\'i}a-Galicia and Carsteanu, {Alin A.} and Clempner, {Julio B.}",

note = "Publisher Copyright: {\textcopyright} 2019 Elsevier Ltd",

year = "2019",

month = sep,

day = "1",

doi = "10.1016/j.eswa.2019.03.055",

language = "Ingl{\'e}s",

volume = "129",

pages = "27--36",

journal = "Expert Systems with Applications",

issn = "0957-4174",

}

TY - JOUR

T1 - Continuous-time reinforcement learning approach for portfolio management with time penalization

AU - García-Galicia, Mauricio

AU - Carsteanu, Alin A.

AU - Clempner, Julio B.

PY - 2019/9/1

Y1 - 2019/9/1

N2 - This paper considers the problem of policy optimization in the context of continuous-time Reinforcement Learning (RL), a branch of artificial intelligence, for financial portfolio management purposes. The underlying asset portfolio process is assumed to possess a continuous-time discrete-state Markov chain structure involving the simplex and ergodicity constraints. The goal of the portfolio problem is the redistribution of a fund into different financial assets. One general assumption has to be set, namely that the market is arbitrage-free (no price arbitrage is possible) then the problem of how to obtain the optimal policy is solvable. We provide a RL solution based on an actor/critic architecture in which the market is characterized by a restriction called transaction cost, involving time penalization. The portfolio problem in Markov chains is determined by solving a convex quadratic minimization problem with linear constraints. Any Markov chain is generated by a stochastic transition matrices and the mathematical expectations of the rewards. In particular, we estimate the elements of the transition rate matrices and the mathematical expectations of the rewards. This method learns the optimal strategy in order to make a decision on what portfolio weight to take for a single period. With this strategy, the agent is able to choose the state with maximum utility and select its respective action. The optimal policy computation is solved employing a proximal optimization novel approach, which involves time penalization in the transaction costs and the rewards. We employ the Lagrange multipliers approach to include the restrictions of the market and those that are imposed by the continuous time frame. Moreover, a specific numerical example in baking, that fit into the general framework of portfolio, validates the effectiveness and usefulness of the proposed method.

AB - This paper considers the problem of policy optimization in the context of continuous-time Reinforcement Learning (RL), a branch of artificial intelligence, for financial portfolio management purposes. The underlying asset portfolio process is assumed to possess a continuous-time discrete-state Markov chain structure involving the simplex and ergodicity constraints. The goal of the portfolio problem is the redistribution of a fund into different financial assets. One general assumption has to be set, namely that the market is arbitrage-free (no price arbitrage is possible) then the problem of how to obtain the optimal policy is solvable. We provide a RL solution based on an actor/critic architecture in which the market is characterized by a restriction called transaction cost, involving time penalization. The portfolio problem in Markov chains is determined by solving a convex quadratic minimization problem with linear constraints. Any Markov chain is generated by a stochastic transition matrices and the mathematical expectations of the rewards. In particular, we estimate the elements of the transition rate matrices and the mathematical expectations of the rewards. This method learns the optimal strategy in order to make a decision on what portfolio weight to take for a single period. With this strategy, the agent is able to choose the state with maximum utility and select its respective action. The optimal policy computation is solved employing a proximal optimization novel approach, which involves time penalization in the transaction costs and the rewards. We employ the Lagrange multipliers approach to include the restrictions of the market and those that are imposed by the continuous time frame. Moreover, a specific numerical example in baking, that fit into the general framework of portfolio, validates the effectiveness and usefulness of the proposed method.

KW - Continuous-time

KW - Markov chains

KW - Portfolio

KW - Reinforcement learning

KW - Transaction costs

UR - http://www.scopus.com/inward/record.url?scp=85063719116&partnerID=8YFLogxK

U2 - 10.1016/j.eswa.2019.03.055

DO - 10.1016/j.eswa.2019.03.055

M3 - Artículo

SN - 0957-4174

VL - 129

SP - 27

EP - 36

JO - Expert Systems with Applications

JF - Expert Systems with Applications

ER -

Continuous-time reinforcement learning approach for portfolio management with time penalization

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this