Improving searchability of datasets - PhDData

Access database of worldwide thesis




Improving searchability of datasets

The thesis was published by Kacprzak, Emilia, Magdalena, in March 2022, University of Southampton.

Abstract:

Data is one of the most important digital assets in the world thanks to its business and social value. As is becoming increasingly available online, in order to use it effectively, we need tools that allow us to retrieve the most relevant datasets that match our information needs. Web search engines are not well suited for this task as they are designed for documents, not data. In recent years several bespoke search engines have been proposed to help with finding datasets, such as Google Dataset Search crawling the whole web or DataMed focused on creating an index of biomedical datasets. In this work we look closer into the problem of searching for data on the example of Open Data platforms. We first applied a mixed-methods approach aimed at deepening our understanding of users of Open Data portals and types of queries they issue while searching for datasets accompanied by analysis of search sessions over one of these data portals. Based on our findings we look into a particular problem of dataset interpretation – meaning of numerical columns. We propose a novel approach for assigning semantic labels to numerical columns. We conclude our work with the analysis of the future work needed in the field in order to potentially improve the searchability of datasets on the web.



Read the last PhD tips