Principled and Efficient Bilevel Optimization for Machine Learning - PhDData

Access database of worldwide thesis




Principled and Efficient Bilevel Optimization for Machine Learning

The thesis was published by Grazzi, Riccardo, in September 2023, UCL (University College London).

Abstract:

Automatic differentiation (AD) is a core element of most modern machine learning
libraries that allows to efficiently compute derivatives of a function from the corresponding program. Thanks to AD, machine learning practitioners have tackled
increasingly complex learning models, such as deep neural networks with up to hundreds of billions of parameters, which are learned using the derivative (or gradient)
of a loss function with respect to those parameters. While in most cases gradients
can be computed exactly and relatively cheaply, in others the exact computation
is either impossible or too expensive and AD must be used in combination with
approximation methods. Some of these challenging scenarios arising for example in
meta-learning or hyperparameter optimization, can be framed as bilevel optimization
problems, where the goal is to minimize an objective function that is evaluated by
first solving another optimization problem, the lower-level problem. In this work, we
study efficient gradient-based bilevel optimization algorithms for machine learning
problems. In particular, we establish convergence rates for some simple approaches
to approximate the gradient of the bilevel objective, namely the hypergradient, when
the objective is smooth and the lower-level problem consists in finding the fixed
point of a contraction map. Leveraging such results, we also prove that the projected
inexact hypergradient method achieves a (near) optimal rate of convergence. We
establish these results for both the deterministic and stochastic settings. Additionally, we provide an efficient implementation of the methods studied and perform
several numerical experiments on hyperparameter optimization, meta-learning, datapoisoning and equilibrium models, which show that our theoretical results are good
indicators of the performance in practice.



Read the last PhD tips