Promoting Quality and Diversity in Population-based Reinforcement Learning via Hierarchical Trajectory Space Exploration

Published in July 12th, 2022

Abstract

Quality Diversity (QD) algorithms in population-based reinforcement learning aim to optimize agents’ returns and diversity among the population simultaneously. It is conducive to solving exploration problems in reinforcement learning and potentially getting multiple good and diverse strategies. However, previous methods typically define behavioral embedding in action space or outcome space, which neglect trajectory characteristics during the execution process. In this paper, we introduce a trajectory embedding model trained by Variational Autoencoder with similarity constraint to characterize trajectory features. Based on that, we propose a hierarchical trajectory-space exploration (HTSE) framework using Determinantal Point Processes (DPP) to generate high-quality and diverse solutions in the selection and mutation process. The experimental results show that our HTSE method effectively completes several simulated tasks, outperforming other Quality-Diversity Reinforcement Learning algorithms.

BibTex

@inproceedings{Miao2022,
  doi = {10.1109/icra46639.2022.9811888},
  url = {https://doi.org/10.1109/icra46639.2022.9811888},
  year = {2022},
  month = may,
  publisher = ,
  author = {Jiayu Miao and Tianze Zhou and Kun Shao and Ming Zhou and Weinan Zhang and Jianye Hao and Yong Yu and Jun Wang},
  title = {Promoting Quality and Diversity in Population-based Reinforcement Learning via Hierarchical Trajectory Space Exploration},
  booktitle = {2022 International Conference on Robotics and Automation ({ICRA})}
}