Novel Algorithm-Level Approaches for Class-Imbalanced Machine Learning - PhDData

Access database of worldwide thesis




Novel Algorithm-Level Approaches for Class-Imbalanced Machine Learning

The thesis was published by Twomey, David, in March 2023, UCL (University College London).

Abstract:

Machine learning classifiers are designed with the underlying assumption of a roughly balanced number of instances per class. However, in many real-world applications this is far from true. This thesis explores adaptations of neural networks which are robust to class imbalanced datasets, do not involve data manipulation, and are flexible enough to be used with any model architecture or framework. The thesis explores two complementary approaches to the problem of class imbalance. The first exchanges conventional choices of classification loss function, which are fundamentally measures of how far network outputs are from desired ones, for ones that instead primarily register whether outputs are right or wrong. The construction of these novel loss functions involves the concept of an approximated confusion matrix, another use of which is to generate new performance metrics, especially useful for monitoring validation behaviour for imbalanced datasets. The second approach changes the form of the output layer activation function to one with a threshold which can be learned so as to more easily classify the more difficult minority class. These two approaches can be used together or separately, with the combined technique being a promising approach for cases of extreme class imbalance. While the methods are developed primarily for binary classification scenarios, as these are the most numerous in the applications literature, the novel loss functions introduced here are also demonstrated to be extensible to a multi-class scenario



Read the last PhD tips