Symmetry and structure in deep reinforcement learning
In this thesis, we study symmetry and structure in deep reinforcement learning. We divide the thesis into two parts. In the first, we explore how to leverage knowledge of symmetries in reinforcement learning. In the second, we propose methods for learning the structure of an agent’s environment and states. We propose MDP Homomorphic Networks, neural networks that are equivariant under symmetries in an MDP’s joint state-action space. Due to equivariance, we find improved data efficiency compared to non-equivariant baselines. We propose Multi-Agent MDP Homomorphic Networks, a class of networks that allows distributed execution using only local information, yet able to share experience between global symmetries in the joint state-action space of cooperative multi-agent systems. We show that global equivariance improves data efficiency compared to non-equivariant distributed networks on symmetric coordination problems. We propose PRAE. PRAE exploits action equivariance for representation learning in reinforcement learning. Equivariance under actions states that transitions in the input space are mirrored by equivalent transitions in latent space, while the map and transition functions should also commute. We prove that under certain assumptions, the learned map is an MDP homomorphism and show empirically that the approach is data-efficient and fast to train, generalizing well to new goal states and instances with the same environmental dynamics. We propose C-SWMs, which find object-oriented representations of states from pixels, using contrastive coding and graph neural network transition functions. We show improvement in multi-step prediction and generalization to unseen environment configurations compared to using decoders, unstructured transitions, or unstructured representations.