Improving image captioning methods using machine learning approaches - PhDData

Access database of worldwide thesis




Improving image captioning methods using machine learning approaches

The thesis was published by Atliha, Viktar, in June 2023, Vilnius Gediminas Technical University.

Abstract:

Recently, computer vision (CV) and natural language processing (NLP) fields started gaining increasing attention from researchers and the industry. While the first bunch of methods allows for solving many tasks within the images and pictures domain, such as image classification, image detection, etc., the others work in a text domain, including text classification or translation tasks. However, many problems remain on a border between the two mentioned domains that have a practical use. One of them is called image captioning. The goal of image captioning systems is to automatically generate a human-like textual description of the given image. Such systems could be used for smoother human–computer interactions, information retrieval, or, more importantly, to help visually impaired people. To succeed, algorithms used in these systems should consume low resources (particularly, acquire little memory) and be of high quality. As the image captioning task is a cross-domain, and state-of-the-art models for computer vision and natural language processing tasks use deep learning models, it also leads to using such approaches for the image captioning task. However, most of the well-known methods of improving image captioning models tend to be focused more on quality improvement, considering no additional resources are needed. Thus, the best models, for now, are very big and unsuitable for use on mobile and other memory-constrained devices where they could bring the greatest practical benefit. The dissertation consists of an introduction, three main chapters, and general conclusions. The First Chapter reviews existing research on image captioning. The Second Chapter investigates the application of model compression methods for existing image captioning models, proposing several methods of reducing the model size without significant quality loss. The Third Chapter focuses on improving image captioning models without significant changes (or without changes at all) in model architecture, highlighting the importance of such methods. The performed experiments and analysis showed that image-captioning models could be significantly compressed without almost any quality loss. Application of all proposed methods allowed to reduce the model size by 91%, losing only up to 3% in the main quality metrics. More than that, methods proposed for improving quality without changing models’ architecture allowed for almost neutralizing this effect, leading to up to 5% quality improvements.



Read the last PhD tips