Planning and Policy Improvement - PhDData

Access database of worldwide thesis




Planning and Policy Improvement

The thesis was published by Danihelka, Ivo, in March 2023, UCL (University College London).

Abstract:

MuZero is currently the most successful general reinforcement learning algorithm, achieving the state of the art on Go, chess, shogi, and Atari. We want to help MuZero to be successful in even more domains. Towards that, we do three steps: 1) We identify MuZero’s problems on stochastic environments and provide ways to model enough information to support causally correct planning. 2) We develop a strong baseline agent on Atari. This agent, named Muesli, matches the state of the art on Atari, even without deep search. The conducted ablations inform us about the importance of model learning, deep search, large networks, and regularized policy optimization. 3) Because MuZero’s tree search is very helpful on Go and chess, we use the principle of policy improvement to design search algorithms with even better properties. The new algorithms, named Gumbel AlphaZero and Gumbel MuZero, match the state of the art on Go, chess, and Atari, and significantly improve prior performance when planning with few simulations.



Read the last PhD tips