Set-valued Data: Regression, Design and Outliers
The focus of this dissertation is to study setâvalued data from three aspects, namely regression, optimal design and outlier identification. This dissertation consists of three peerâreviewed published articles, each of them addressing one aspect. Their titles and abstracts are listed below:
1. Local regression smoothers with setâvalued outcome data:
This paper proposes a method to conduct local linear regression smoothing in the presence of setâvalued outcome data. The proposed estimator is shown to be consistent, and its mean squared error and asymptotic distribution are derived. A method to build error tubes around the estimator is provided, and a small Monte Carlo exercise is conducted to confirm the good finite sample properties of the estimator. The usefulness of the method is illustrated on a novel dataset from a clinical trial to assess the effect of certain genesâ expressions on different lung cancer treatments outcomes.
2. Optimal design for multivariate multiple linear regression with setâidentified response:
We consider the partially identified regression model with setâidentified responses, where the estimator is the set of the least square estimators obtained for all possible choices of points sampled from setâidentified observations. We address the issue of determining the optimal design for this case and show that, for objective functions mimicking those for several classical optimal designs, their setâidentified analogues coincide with the optimal designs for pointâidentified realâvalued responses.
3. Depth and outliers for samples of sets and random sets distributions:
We suggest several constructions suitable to define the depth of setâvalued observations with respect to a sample of convex sets or with respect to the distribution of a random closed convex set. With the concept of a depth, it is possible to determine if a given convex set should be regarded an outlier with respect to a sample of convex closed sets. Some of our constructions are motivated by the known concepts of halfâspace depth and band depth for functionâvalued data. A novel construction derives the depth from a family of nonâlinear expectations of random sets. Furthermore, we address the role of positions of sets for evaluation of their depth. Two case studies concern interval regression for Greek wine data and detection of outliers in a sample of particles.