Worldwide Thesis Database & PhD tips

Graph neural network for audio representation learning - PhDData

Access database of worldwide thesis

Graph neural network for audio representation learning

The thesis was published by Shirian, Amir, in January 2022, University of Warwick.

Abstract:

Learning audio representations is an important task with many potential applications. Whether it takes the shape of speech, music, or ambient sounds, audio is a common form of data that may communicate rich information. Audio representation learning is also a fundamental ingredient of deep learning. However, learning a good representation is a challenging task. Audio representation learning can also enable more accurate downstream tasks both in audio and video, such as emotion recognition. For audio representation learning, such a representation should contain the information needed to understand the input sound and make discriminative patterns. This necessitates a sizable volume of carefully annotated data, which requires a considerable amount of labour. In this thesis, we propose a set of models for audio representation learning. We address the discriminative patterns by proposing graph structure and graph neural network to further process it. Our work is the first to consider the graph structure for audio data. In contrast to existing methods that use approximation, our first model proposes a manual graph structure and uses a graph convolution layer with accurate graph convolution operation. In the second model, By integrating a graph inception network, we expand the manually created graph structure and simultaneously learn it with the primary objective in our model. In the third model, we addressed the dearth of annotated data by including a semi-supervised graph technique that represents audio corpora as nodes in a graph and connects them depending on label information in smaller subgraphs. We brought up the issue of leveraging multimodal data to improve audio representation learning in addition to earlier works. To accommodate multimodal input data, we included heterogeneous graph data to our fourth model. Additionally, we created a new graph architecture to handle multimodal data.

The full thesis can be downloaded at :
http://webcat.warwick.ac.uk/record=b3909811
https://wrap.warwick.ac.uk/176718/
https://wrap.warwick.ac.uk/176718/1/WRAP_Theses_Shirian_2022.pdf

Read the last PhD tips

2022
September

What are common stages that PhD student researchers go through with their thesis project?
2022
November

What can I do if my Ph.D. supervisor is asking me to share the source code of my research? And how to protect the novelty of my thesis?
2022
October

What is the dark side of being a Ph.D. student?
2024
November

Understanding the Value of Master’s vs PhD Degrees
2022
September

Can a Ph.D. supervisor fire you?