Kernel Meta-Learning by Leveraging Natural Data Assumptions - PhDData

Access database of worldwide thesis




Kernel Meta-Learning by Leveraging Natural Data Assumptions

The thesis was published by Falk, John Isak Texas, in November 2023, UCL (University College London).

Abstract:

Data representation is integral to meta-learning and is effectively done using
kernels. Good performance requires algorithms that can learn kernels (or feature maps) from collections of tasks sampled from a meta-distribution. In
this thesis we exploit natural assumptions on the meta-distribution to design
meta-kernel learning algorithms, leading to two novel state-of-the-art (SOTA)
meta-classification and regression algorithms. The first method, Meta-Label
Learning (MeLa) [Wan+22] leverages the meta-classification assumption that
each task is generated from a global base dataset by randomly sampling C
classes, anonymising the labels, then sampling K instances from each class.
Anonymity of task-labels prohibit us from pooling task-instances with the
same global class. MeLa recovers, in some cases perfectly, the underlying true
classes of all task-instances allowing us to form a standard dataset and train
a feature map in a supervised manner. This procedure leads to SOTA performance while being faster and more robust than alternative few-shot learning
algorithms. For meta-regression the notion of global classes is not well-defined.
In Implicit Kernel Meta-Learning (IKML) [FCP22] we leverage the assumption that the optimal task-regressors belong to an RKHS with a kernel that
is translation-invariant. We learn such a kernel from a kernel family characterized by a neural network through a pushforward model using Bochner’s
theorem. The model is trained by optimizing the meta-loss with random feature kernel ridge regression as the base algorithm. IKML achieves SOTA on
two meta-regression benchmarks while allowing to trade accuracy for speed at
test-time. We provide a bound on the excess transfer risk, allowing to specify the least number of random features necessary to achieve optimal generalization performance.



Read the last PhD tips