Worldwide Thesis Database & PhD tips

Improving image captioning methods using machine learning approaches - PhDData

Access database of worldwide thesis

Improving image captioning methods using machine learning approaches

The thesis was published by Atliha, Viktar, in June 2023, Vilnius Gediminas Technical University.

Abstract:

Recently, computer vision (CV) and natural language processing (NLP) fields started gaining increasing attention from researchers and the industry. While the first bunch of methods allows for solving many tasks within the images and pictures domain, such as image classification, image detection, etc., the others work in a text domain, including text classification or translation tasks. However, many problems remain on a border between the two mentioned domains that have a practical use. One of them is called image captioning. The goal of image captioning systems is to automatically generate a human-like textual description of the given image. Such systems could be used for smoother humanโ€“computer interactions, information retrieval, or, more importantly, to help visually impaired people. To succeed, algorithms used in these systems should consume low resources (particularly, acquire little memory) and be of high quality. As the image captioning task is a cross-domain, and state-of-the-art models for computer vision and natural language processing tasks use deep learning models, it also leads to using such approaches for the image captioning task. However, most of the well-known methods of improving image captioning models tend to be focused more on quality improvement, considering no additional resources are needed. Thus, the best models, for now, are very big and unsuitable for use on mobile and other memory-constrained devices where they could bring the greatest practical benefit. The dissertation consists of an introduction, three main chapters, and general conclusions. The First Chapter reviews existing research on image captioning. The Second Chapter investigates the application of model compression methods for existing image captioning models, proposing several methods of reducing the model size without significant quality loss. The Third Chapter focuses on improving image captioning models without significant changes (or without changes at all) in model architecture, highlighting the importance of such methods. The performed experiments and analysis showed that image-captioning models could be significantly compressed without almost any quality loss. Application of all proposed methods allowed to reduce the model size by 91%, losing only up to 3% in the main quality metrics. More than that, methods proposed for improving quality without changing modelsโ€ architecture allowed for almost neutralizing this effect, leading to up to 5% quality improvements.

The full thesis can be downloaded at :
https://vb.vgtu.lt/object/elaba:176938178/176938178.pdf
https://vb.vgtu.lt/VGTU:ELABAETD176938178&prefLang=en_US

Read the last PhD tips

2022
September

How can I tell my Ph.D. supervisor I published a paper about my thesis without telling them or listing them as authors?
2022
October

Why shouldn’t someone pursue a PhD?
2022
October

Do European Ph.D. programs are soo different than American Ph.D. programs to treat students?
2022
October

Is PhD losing its value in scientific research?
2022
October

What are the little secrets of elite Ph.D. programs?