Change-Point Detection Methods for Behavioral Shift Recognition in Mental Healthcare
Human behavior analysis has been approached from different perspectives along time. In
recent years, the emergence of new technologies and digitalization advances have risen as
an alternative tool for behavior characterization, as well as for the detection of changes over
time. In particular, the generalized use of smartphones and electronic devices, which are
continuously collecting data from the user, provide a representation of behavior in different
areas of a person’s life, such as mobility, physical activity or social interactions. In addition,
they allow us a passive monitorization, that is, without the need for the user to interact
directly with the device, collecting information in a unobtrusive manner and therefore
without altering their daily routine. This methodology implies, among other advantages,
that the user does not subjectively influence the information collected, obtaining objective
representations of their behavior. This approach to the characterization and analysis of
behavior and its changes has many applications, notably in medicine. In this work, we focus
specifically on the field of mental health, where the characterization and early detection of
behavioral changes is important in order to prevent relapses in psychiatric patients and, in
particular, in those with a history of suicidal behavior to try to prevent possible suicide
attempts or psychiatric emergency admissions.
Our approach is based on the development and application of mathematical and statistical
models that can help us to detect these changes from passively collected data. However,
despite the mentioned advantages, working with data collected through electronic devices
and, specifically in a clinical scenario, is a challenge due to its characteristics. These
are data with a very complex structure since, first of all, they are irregularly sampled in
time (the samples can be stored every 5 minutes, when a specific activity starts or daily).
Second, each observation can be heterogeneous, where by heterogeneous we mean that it
is made up of several sources of different statistical type (continuous, discrete) or same
type but, statistically, with different marginal distributions. In addition, the existence of
several sources and the frequency of the samples causes that each day is represented by
a high-dimensional vector, focusing on the need for scalable algorithms. Lastly, these
are data sequences with many missing values and very diverse patterns due, for example,
to the lack of permissions on the phone, disconnection periods or, simply, the temporal
irregularity already mentioned. The preprocessing of data with these characteristics requires a huge effort and time
cost that is not feasible when dealing with such a demanding goal, as it is the prediction
and prevention of suicide attempts, since the information must be processed in real time
every minute is important. Therefore, we need methods that are fast, efficient, accurate
and adapted to the complexity of the data we are working with. For this reason, instead
of focusing our efforts on data mining, which is generally conditioned to a specific initial
hypothesis and hinders reproducibility, we work on methods that are capable of handling
data sequences with the previously aforementioned characteristics, and do it in an online
manner. That is, algorithms capable of processing the samples as they are being recorded.
In this thesis, we focus on the development of probabilistic models for behavior
change detection, proposing algorithms that can work on heterogeneous, multi-source,
high-dimensional sequential data with missing values. In our scenario, we assume that the
joint distribution of the data changes at a given moment, segmenting the sequence, and our
goal is to detect this change and to do so with the least possible delay.
We begin by describing the benefits of using digital phenotyping for the characterization
of human behavior changes, and we introduce an example of a specific monitoring e-health
system with which we have worked. We present two works on data mining in medicine
through digital phenotype modelling: the prediction of disability level in different domains
of daily life and the analysis of causal relationships between variables in order to detect
negative effects caused by isolation during the Covid-19 pandemic in psychiatric patients.
In the following -more technical- chapters, we go a step further, and change the focus:
from fully adapting our data to existing methods, to proposing algorithms that are specific
for heterogeneous, multi-source, high-dimensional sequential data with missing values.
We focus on the development of change point detection (CPD) algorithms and present the
benefits of using latent variable models to deal with the problem of high-dimensional data
sets, and provide methods that are able of integrating data from different statistical type.
We also present a flexible CPD model that works on local observation models (LOMs)
defined based on the statistical type, source or previous knowledge of the initial data,
generated from local discrete latent variable models. In this way, the information is
transformed into homogeneous low-dimensional spaces, maintaining the benefits of the
previously proposed algorithms but also allowing an equivalent level of treatment of
all local representations, thus solving the initial problem of heterogeneity. In addition,
different CPD factorization models are defined and adapted that weight the contribution of
each local representation to the global detection following different approaches, holding
for every previously proposed local observation models, and adding explainability on the
degree of contribution of each local representation to the joint detection. We evaluated
and tested the proposed models on synthetic data, demonstrating an improvement in the
precision and a reduction in the delay of the detection, proving their robustness against
the presence of missing data. Finally, we apply some of these methods to a real data set
within a study of behavioral change characterization in psychiatric patients with a history
of suicide-related events. We present individualized models for change detection over The preprocessing of data with these characteristics requires a huge effort and time
cost that is not feasible when dealing with such a demanding goal, as it is the prediction
and prevention of suicide attempts, since the information must be processed in real time
every minute is important. Therefore, we need methods that are fast, efficient, accurate
and adapted to the complexity of the data we are working with. For this reason, instead
of focusing our efforts on data mining, which is generally conditioned to a specific initial
hypothesis and hinders reproducibility, we work on methods that are capable of handling
data sequences with the previously aforementioned characteristics, and do it in an online
manner. That is, algorithms capable of processing the samples as they are being recorded.
In this thesis, we focus on the development of probabilistic models for behavior
change detection, proposing algorithms that can work on heterogeneous, multi-source,
high-dimensional sequential data with missing values. In our scenario, we assume that the
joint distribution of the data changes at a given moment, segmenting the sequence, and our
goal is to detect this change and to do so with the least possible delay.
We begin by describing the benefits of using digital phenotyping for the characterization
of human behavior changes, and we introduce an example of a specific monitoring e-health
system with which we have worked. We present two works on data mining in medicine
through digital phenotype modelling: the prediction of disability level in different domains
of daily life and the analysis of causal relationships between variables in order to detect
negative effects caused by isolation during the Covid-19 pandemic in psychiatric patients.
In the following -more technical- chapters, we go a step further, and change the focus:
from fully adapting our data to existing methods, to proposing algorithms that are specific
for heterogeneous, multi-source, high-dimensional sequential data with missing values.
We focus on the development of change point detection (CPD) algorithms and present the
benefits of using latent variable models to deal with the problem of high-dimensional data
sets, and provide methods that are able of integrating data from different statistical type.
We also present a flexible CPD model that works on local observation models (LOMs)
defined based on the statistical type, source or previous knowledge of the initial data,
generated from local discrete latent variable models. In this way, the information is
transformed into homogeneous low-dimensional spaces, maintaining the benefits of the
previously proposed algorithms but also allowing an equivalent level of treatment of
all local representations, thus solving the initial problem of heterogeneity. In addition,
different CPD factorization models are defined and adapted that weight the contribution of
each local representation to the global detection following different approaches, holding
for every previously proposed local observation models, and adding explainability on the
degree of contribution of each local representation to the joint detection. We evaluated
and tested the proposed models on synthetic data, demonstrating an improvement in the
precision and a reduction in the delay of the detection, proving their robustness against
the presence of missing data. Finally, we apply some of these methods to a real data set
within a study of behavioral change characterization in psychiatric patients with a history
of suicide-related events. We present individualized models for change detection over The preprocessing of data with these characteristics requires a huge effort and time
cost that is not feasible when dealing with such a demanding goal, as it is the prediction
and prevention of suicide attempts, since the information must be processed in real time
every minute is important. Therefore, we need methods that are fast, efficient, accurate
and adapted to the complexity of the data we are working with. For this reason, instead
of focusing our efforts on data mining, which is generally conditioned to a specific initial
hypothesis and hinders reproducibility, we work on methods that are capable of handling
data sequences with the previously aforementioned characteristics, and do it in an online
manner. That is, algorithms capable of processing the samples as they are being recorded.
In this thesis, we focus on the development of probabilistic models for behavior
change detection, proposing algorithms that can work on heterogeneous, multi-source,
high-dimensional sequential data with missing values. In our scenario, we assume that the
joint distribution of the data changes at a given moment, segmenting the sequence, and our
goal is to detect this change and to do so with the least possible delay.
We begin by describing the benefits of using digital phenotyping for the characterization
of human behavior changes, and we introduce an example of a specific monitoring e-health
system with which we have worked. We present two works on data mining in medicine
through digital phenotype modelling: the prediction of disability level in different domains
of daily life and the analysis of causal relationships between variables in order to detect
negative effects caused by isolation during the Covid-19 pandemic in psychiatric patients.
In the following -more technical- chapters, we go a step further, and change the focus:
from fully adapting our data to existing methods, to proposing algorithms that are specific
for heterogeneous, multi-source, high-dimensional sequential data with missing values.
We focus on the development of change point detection (CPD) algorithms and present the
benefits of using latent variable models to deal with the problem of high-dimensional data
sets, and provide methods that are able of integrating data from different statistical type.
We also present a flexible CPD model that works on local observation models (LOMs)
defined based on the statistical type, source or previous knowledge of the initial data,
generated from local discrete latent variable models. In this way, the information is
transformed into homogeneous low-dimensional spaces, maintaining the benefits of the
previously proposed algorithms but also allowing an equivalent level of treatment of
all local representations, thus solving the initial problem of heterogeneity. In addition,
different CPD factorization models are defined and adapted that weight the contribution of
each local representation to the global detection following different approaches, holding
for every previously proposed local observation models, and adding explainability on the
degree of contribution of each local representation to the joint detection. We evaluated
and tested the proposed models on synthetic data, demonstrating an improvement in the
precision and a reduction in the delay of the detection, proving their robustness against
the presence of missing data. Finally, we apply some of these methods to a real data set
within a study of behavioral change characterization in psychiatric patients with a history
of suicide-related events. We present individualized models for change detection over passively-sensed data via smartphones, and use suicide attempts and psychiatric emergency
admissions as real labels with the aim of predicting them one week in advance.
https://doi.org/10.1038/s41380-020-00963-5
https://arxiv.org/pdf/2007.12420.pdf
https://doi.org/10.3389/fonc.2022.880430
https://arxiv.org/pdf/2011.09848.pdf
https://doi.org/10.1007/s11265-021-01705-8
https://preprints.jmir.org/preprint/38231
https://preprints.jmir.org/preprint/43719
https://doi.org/10.1016/j.patcog.2022.109116
http://hdl.handle.net/10016/36642
