Machine Learning Applied to Raman Spectroscopy to Classify Cancers - PhDData

Access database of worldwide thesis




Machine Learning Applied to Raman Spectroscopy to Classify Cancers

The thesis was published by Blake, Nathan, in August 2023, UCL (University College London).

Abstract:

Cancer diagnosis is notoriously difficult, evident in the inter-rater variability between
histopathologists classifying cancerous sub-types. Although there are many cancer
pathologies, they have in common that earlier diagnosis would maximise treatment
potential. To reduce this variability and expedite diagnosis, there has been a drive to
arm histopathologists with additional tools. One such tool is Raman spectroscopy,
which has demonstrated potential in distinguishing between various cancer types.
However, Raman data has high dimensionality and often contains artefacts and
together with challenges inherent to medical data, classification attempts can be
frustrated. Deep learning has recently emerged with the promise of unlocking many
complex datasets, but it is not clear how this modelling paradigm can best exploit
Raman data for cancer diagnosis.
Three Raman oncology datasets (from ovarian, colonic and oesophageal tissue)
were used to examine various methodological challenges to machine learning applied
to Raman data, in conjunction with a thorough review of the recent literature. The
performance of each dataset is assessed with two traditional and one deep learning
models. A technique is then applied to the deep learning model to aid interpretability
and relate biochemical antecedents to disease classes. In addition, a clinical problem
for each dataset was addressed, including the transferability of models developed
using multi-centre Raman data taken different on spectrometers of the same make.
Many subtleties of data processing were found to be important to the realistic
assessment of a machine learning models. In particular, appropriate cross-validation
during hyperparameter selection, splitting data into training and test sets according
to the inherent structure of biomedical data and addressing the number of samples
Abstract ”
per disease class are all found to be important factors. Additionally, it was found that
instrument correction was not needed to ensure system transferability if Raman data
is collected with a common protocol on spectrometers of the same make.



Read the last PhD tips