Efficient Hindsight Experience Replay with Transformed Data Augmentation

Jiazheng  Sun; Weiguang  Li

doi:10.6180/jase.202402_27(2).0011

Efficient Hindsight Experience Replay with Transformed Data Augmentation

Mechanical Engineering

Jiazheng Sun, Weiguang LiThis email address is being protected from spambots. You need JavaScript enabled to view it.

School of Mechanical and Automotive Engineering, South China University of Technology Guangzhou, Guangdong, China

Received: February 27, 2022
Accepted: April 6, 2023
Publication Date: July 5, 2023

Copyright The Author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are cited.

Download Citation: ||https://doi.org/10.6180/jase.202402_27(2).0011

Motion control of robots is a high-dimensional, nonlinear control problem that is often difficult to handle using traditional dynamical path planning means. Reinforcement learning is currently an effective means to solve robot motion control problems, but reinforcement learning has disadvantages such as high number of trials and errors and sparse rewards, which restrict the application efficiency of reinforcement learning. The Hindsight Experience Replay(HER) algorithm is a reinforcement learning algorithm that solves the reward sparsity problem by constructing virtual target values. However, the HER algorithm still suffers from the problem of long time in the early stage of training, and there is still room for improving its sample utilization efficiency. Augmentation by existing data to improve training efficiency has been widely used in supervised learning, but is less applied in the field of reinforcement learning. In this paper, we propose the Hindsight Experience Replay with Transformed Data Augmentation (TDAHER) algorithm by constructing a transformed data augmentation method for reinforcement learning samples, combined with the HER algorithm. And in order to solve the problem of the accuracy of the augmented samples in the later stage of training, the decaying participation factor method is introduced. After the comparison of four simulated robot control tasks, it is proved that the algorithm can effectively improve the training efficiency of reinforcement learning.

Keywords: Reinforcement learning; Machine learning; Motion control; Data augmentation; component;

[1] N. Kohl and P. Stone. “Policy gradient reinforcement learning for fast quadrupedal locomotion”. In: IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA’04. 2004. 3. IEEE. 2004, 2619– 2624. DOI: 10.1109/robot.2004.1307456.
[2] E. Theodorou, J. Buchli, and S. Schaal. “Reinforcement learning of motor skills in high dimensions: A path integral approach”. In: 2010 IEEE International Conference on Robotics and Automation. IEEE. 2010, 2397–2403. DOI: 10.1109/ROBOT.2010.5509336.
[3] D.-H. Chun, M.-I. Roh, H.-W. Lee, J. Ha, and D. Yu, (2021) “Deep reinforcement learning-based collision avoidance for an autonomous ship" Ocean Engineering 234: 109216. DOI: 10.1016/j.oceaneng.2021.109216.
[4] P. Rauber, A. Ummadisingu, F. Mutz, and J. Schmidhuber, (2021) “Reinforcement Learning in SparseReward Environments With Hindsight Policy Gradients" Neural Computation 33(6): 1498–1553. DOI: 10.1162/neco_a_01387.
[5] M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder, B. McGrew, J. Tobin, O. Pieter Abbeel, and W. Zaremba, (2017) “Hindsight experience replay" Advances in neural information processing systems 30:
[6] R. S. Sutton, A. G. Barto, et al., (1999) “Reinforcement learning" Journal of Cognitive Neuroscience 11(1): 126–134.
[7] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al., (2015) “Humanlevel control through deep reinforcement learning" nature 518(7540): 529–533. DOI: 10.1038/nature14236.
[8] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, (2015) “Continuous control with deep reinforcement learning" arXiv preprint arXiv:1509.02971:
[9] R. Munos, T. Stepleton, A. Harutyunyan, and M. Bellemare, (2016) “Safe and efficient off-policy reinforcement learning" Advances in neural information processing systems 29:
[10] D. A. Van Dyk and X.-L. Meng, (2001) “The art of data augmentation" Journal of Computational and Graphical Statistics 10(1): 1–50. DOI: 10.1198/10618600152418584.
[11] C. Shorten and T. M. Khoshgoftaar, (2019) “A survey on image data augmentation for deep learning" Journal of big data 6(1): 1–48. DOI: 10.1186/s40537-019-0197-0.
[12] M. Bayer, M.-A. Kaufhold, B. Buchhold, M. Keller, J. Dallmeyer, and C. Reuter, (2023) “Data augmentation in natural language processing: a novel text generation approach for long and short text classifiers" International journal of machine learning and cybernetics 14(1): 135–150. DOI: 10.1007/s13042-022-01553-3.
[13] M. Laskin, A. Srinivas, and P. Abbeel. “Curl: Contrastive unsupervised representations for reinforcement learning”. In: International Conference on Machine Learning. PMLR. 2020, 5639–5650.
[14] M. Laskin, K. Lee, A. Stooke, L. Pinto, P. Abbeel, and A. Srinivas, (2020) “Reinforcement learning with augmented data" Advances in neural information processing systems 33: 19884–19895.
[15] I. Kostrikov, D. Yarats, and R. Fergus, (2020) “Image augmentation is all you need: Regularizing deep reinforcement learning from pixels" arXiv preprint arXiv:2004.13649:
[16] Y. Matsuo, Y. LeCun, M. Sahani, D. Precup, D. Silver, M. Sugiyama, E. Uchibe, and J. Morimoto, (2022) “Deep learning, reinforcement learning, and world models" Neural Networks: DOI: 10.1016/j.neunet.2022.03.037.
[17] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, (2016) “Openai gym" arXiv preprint arXiv:1606.01540:
[18] M. Plappert, M. Andrychowicz, A. Ray, B. McGrew, B. Baker, G. Powell, J. Schneider, J. Tobin, M. Chociej, P. Welinder, et al., (2018) “Multi-goal reinforcement learning: Challenging robotics environments and request for research" arXiv preprint arXiv:1802.09464:
[19] E. Todorov, T. Erez, and Y. Tassa. “Mujoco: A physics engine for model-based control”. In: 2012 IEEE/RSJ international conference on intelligent robots and systems. IEEE. 2012, 5026–5033. DOI: 10.1109/IROS.2012.6386109.
[20] A. Raffin, A. Hill, M. Ernestus, A. Gleave, A. Kanervisto, and N. Dormann. Stable baselines3. 2019.