Contributions to robustness, local differential privacy and change point analysis
Machine learning and statistical algorithms are now implemented at a large scale in almost every aspect of our society, significantly impacting our daily lives through their performance. Hence, there is a soaring demand for the development of trustworthy procedures. In this thesis, we look at three aspects – robustness, privacy and change point analysis – in modern data analysis that are broadly related to the issue of trust. The main part of this thesis consists of three chapters, where we examine two aspects in each chapter. We adopt an information-theoretic framework throughout the thesis.
To begin with, in Chapter 2, we study a robust univariate mean change point analysis problem. Specifically, we consider a form of adversarial contamination, where the contamination distributions are allowed to vary with time, and study its impact on the fundamental difficulty of detecting change points. Information-theoretic lower bounds, which quantify the cost of contamination, are established. Additionally, an efficient and theoretically-justified algorithm is proposed to guard against adversarial attacks. The code implementation for our proposed algorithm is available in the R package changepoints.
In Chapter 3, we study a change point analysis problem for high dimensional sparse network data, under two types of local differential privacy (LDP) constraints. One focuses on edge-level privacy and the other on node-level privacy. The fundamental limits in consistently localising change points under both constraints are investigated. Moreover, we introduce private signal-to-noise ratio conditions, which not only quantify the privacy costs but also demonstrate a distinct scaling behaviour in the sparsity parameter compared to their non-private counterparts. Our proposed algorithms combine state-of-the-art network change point analysis methodologies and privacy mechanisms with new insights derived along the way.
After examining the costs of robustness and preserving privacy on change point analysis separately, we systematically study the connections between the optimality under Huber’s contamination model and the LDP constraints in Chapter 4. We consider four canonical problems in statistics, including two-point hypothesis testing, univariate mean estimation, nonparametric density estimation and median estimation. For each of these problems, we develop algorithms that are simultaneously robust, privacy-preserving and statistically rate-optimal. Broadly speaking, we show that procedures satisfying LDP constraints are often robust to contamination, and efficient privacy mechanisms usually use ideas rooted in the robust statistics literature.
http://webcat.warwick.ac.uk/record=b3946548
https://wrap.warwick.ac.uk/180986/
https://wrap.warwick.ac.uk/180986/1/WRAP_Theses_Li_2023.pdf
