Worldwide Thesis Database & PhD tips

Planning and Policy Improvement - PhDData

Access database of worldwide thesis

Planning and Policy Improvement

The thesis was published by Danihelka, Ivo, in March 2023, UCL (University College London).

Abstract:

MuZero is currently the most successful general reinforcement learning algorithm, achieving the state of the art on Go, chess, shogi, and Atari. We want to help MuZero to be successful in even more domains. Towards that, we do three steps: 1) We identify MuZero’s problems on stochastic environments and provide ways to model enough information to support causally correct planning. 2) We develop a strong baseline agent on Atari. This agent, named Muesli, matches the state of the art on Atari, even without deep search. The conducted ablations inform us about the importance of model learning, deep search, large networks, and regularized policy optimization. 3) Because MuZero’s tree search is very helpful on Go and chess, we use the principle of policy improvement to design search algorithms with even better properties. The new algorithms, named Gumbel AlphaZero and Gumbel MuZero, match the state of the art on Go, chess, and Atari, and significantly improve prior performance when planning with few simulations.

The full thesis can be downloaded at :
https://discovery.ucl.ac.uk/id/eprint/10167022/2/ivo_danihelka_thesis.pdf

Read the last PhD tips

2022
October

Do Ph.D. students get time to pursue their hobbies?
2022
October

Is writing so many academic articles a waste of time when most of it isn’t read by anyone?
2022
September

What is beyond Ph.D.?
2022
October

What is the most useless Ph.D. you can acquire?
2022
October

Why is the decision to quit a Ph.D. so challenging to take?