Abstract
EDA-RL, estimation of distribution algorithms for reinforcement learning problems, have been proposed by us recently. The EDA-RL can improve policies by EDA scheme: First, select better episodes. Secondly, estimate probabilistic models, i.e., policies, and finally, interact with the environment for generating new episodes. In this paper, the EDA-RL is extended for multi-objective reinforcement learning problems, where reward is given by several criteria. By incorporating the notions in evolutionary multi-objective optimization, the proposed method is enable to acquire various strategies by a single run.

This publication has 4 references indexed in Scilit: