TY - GEN
T1 - A deep reinforcement learning algorithm based on modified Twin delay DDPG method for robotic applications
AU - Vasquez-Jalpa, Carlos
AU - Nakano-Miyatake, Mariko
AU - Escamilla-Hernandez, Enrique
N1 - Publisher Copyright:
© 2021 ICROS.
PY - 2021
Y1 - 2021
N2 - This paper proposes a deep reinforcement learning algorithm for autonomous robotics, in which we modify twin delay deep deterministic policy gradient (TD3) to adapt for autonomous robots with higher degree freedom in movement. To provide a robot with free movement in the 2D space without collisions against some obstacles, such as wall, a robot is equipped with three cameras. The images captured by camera are used to train Convolutional Neural Networks (CNN) to understand environment with collisions or not-collisions. We added two additional parameters, observation' O', which are images obtained from cameras, and degrees of turns' deg' into the original TD3' s parameters composed of four values: [state's', reward 'r', action 'a' and next-state's' ']. To determine a next action with higher reward from the observation, two additional Neural Networks are constructed, being the first one determines an action from observation and the second one determines degree of turn from the observation and the action. The simulation results under three environments constructed by CoppeliaSim show a good performance of the proposed algorithm, reaching the target with higher rewards, even though the environments are unknown by robots.
AB - This paper proposes a deep reinforcement learning algorithm for autonomous robotics, in which we modify twin delay deep deterministic policy gradient (TD3) to adapt for autonomous robots with higher degree freedom in movement. To provide a robot with free movement in the 2D space without collisions against some obstacles, such as wall, a robot is equipped with three cameras. The images captured by camera are used to train Convolutional Neural Networks (CNN) to understand environment with collisions or not-collisions. We added two additional parameters, observation' O', which are images obtained from cameras, and degrees of turns' deg' into the original TD3' s parameters composed of four values: [state's', reward 'r', action 'a' and next-state's' ']. To determine a next action with higher reward from the observation, two additional Neural Networks are constructed, being the first one determines an action from observation and the second one determines degree of turn from the observation and the action. The simulation results under three environments constructed by CoppeliaSim show a good performance of the proposed algorithm, reaching the target with higher rewards, even though the environments are unknown by robots.
KW - Actor-Critic
KW - Deep Q-Learning
KW - Deep Reinforcement Learning
KW - Policy Gradient
KW - Robot Vision
UR - http://www.scopus.com/inward/record.url?scp=85124227977&partnerID=8YFLogxK
U2 - 10.23919/ICCAS52745.2021.9649882
DO - 10.23919/ICCAS52745.2021.9649882
M3 - Contribución a la conferencia
AN - SCOPUS:85124227977
T3 - International Conference on Control, Automation and Systems
SP - 743
EP - 748
BT - 2021 21st International Conference on Control, Automation and Systems, ICCAS 2021
PB - IEEE Computer Society
T2 - 21st International Conference on Control, Automation and Systems, ICCAS 2021
Y2 - 12 October 2021 through 15 October 2021
ER -