Misclassification bias in statistical learning - PhDData

Access database of worldwide thesis




Misclassification bias in statistical learning

The thesis was published by Meertens, Q.A., in January 2021, University of Amsterdam.

Abstract:

Over the past few years, the demand for official statistics has increased, while national statistical institutes (NSIs) are obliged to reduce the response burden on companies and citizens. Consequently, NSIs are investigating the use of big data and statistical learning methods, such as classifiers. Official statistics that are based on the output of (imperfect) classifiers are biased. The aim of this thesis is to investigate how to reduce that bias (called misclassification bias) when estimating a proportion in a binary classification problem. Under two circumstances, viz. (a) the double sampling scheme and (b) prior probability shift, the thesis contains theoretical results on the statistical quality (the mean squared error) of existing methods that reduce misclassification bias. The results show which method yields the highest statistical quality under which of the two circumstances. The statistical quality is increased further by imposing well-chosen prior constraints in a Bayesian framework. Moreover, the thesis investigates a specific binary classification problem in the context of official statistics, namely identifying webshops among all companies established in the European Union (EU). Solving that classification problem is essential in estimating cross-border Internet purchases within the EU. A new methodology based on statistical learning methods is proposed, resulting in more accurate estimates that are 6 times as high as earlier ones. Thus, the conclusion of the thesis is that statistical learning methods can be used to produce official statistics, as long as misclassification bias is adequately corrected for. Finally, it is argued that domain experts are of vital importance to the successful implementation of statistical learning methods within official statistics.

The full thesis can be downloaded at :
https://pure.uva.nl/ws/files/59712159/Thesis.pdf


Read the last PhD tips