Multi-objective reinforcement learning with continuous pareto frontier approximation