Worldwide Thesis Database & PhD tips

On the efficiency of finding and using tabular data summaries : scalability, accuracy, and hardness - PhDData

Access database of worldwide thesis

On the efficiency of finding and using tabular data summaries : scalability, accuracy, and hardness

The thesis was published by Dickens, Charlie, in June 2021, University of Warwick.

Abstract:

Tabular data is ubiquitous in modern computer science. However, the size of these tables can be large so computing statistics over them is inefficient in both time and space. This thesis is concerned with finding and using small summaries of large tables for scalable and accurate approximation of the data’s properties; or showing such a summary is hard to obtain in small space. This perspective yields the following results:

โ€ข We introduce projected frequency analysis over an n x d binary table. If the query columns are revealed after observing the data, then we show that space exponential in d is required for constant-factor approximation to statistics such as the number of distinct elements on columns S. We present algorithms that use smaller space than a brute-force approach, while tolerating some super constant error for the frequency estimation.

โ€ข We find small-space deterministic summaries for a variety of linear algebraic problems in all p-norms for pโฅ 1. These include finding rows of high leverage, subspace embedding, regression, and low rank approximation.

โ€ข We implement and compare various summary techniques for efficient training of large-scale regression models. We show that a sparse random projection can lead to fast model training despite suboptimal theoretical guarantees than dense competitors. For ridge regression we show that a deterministic summary can reduce the number of gradient steps needed to train the model compared to random projections.

We demonstrate the practicality of our approaches through various experiments by showing that small space summaries can lead to close to optimal solutions.

The full thesis can be downloaded at :
http://wrap.warwick.ac.uk/161585/1/WRAP_Theses_Dickens_2021.pdf

Read the last PhD tips

2024
November

Understanding the Value of Master’s vs PhD Degrees
2022
October

What rude awakening are graduate and Ph.D. students in for?
2022
September

How do PhD students keep their motivation up?
2022
October

PhD Careers: What are the most popular misconceptions about postgraduate study?
2022
October

How can Ph.D. holders succeed outside of academia?