Advanced analytics to improve patient outcomes: Leveraging real-world evidence - PhDData

Access database of worldwide thesis

Advanced analytics to improve patient outcomes: Leveraging real-world evidence

The thesis was published by Sauer, Christopher Martin, in September 2022, VU University Amsterdam.


This thesis discussed how advanced analytics and real-world evidence can be utilized to improve patient outcomes. In chapter 2 we summarize frequently observed pitfalls clinicians and data scientists should seek to avoid. Furthermore, we describe potential solutions how these issues can be avoided and overcome. In chapter 3 we compared the four publicly available adult intensive care data sets. As one guiding principle, we suggest eICU-CRD be used for studies when large patient numbers are needed, and concerns include a diverse patient cohort and generalizability. HiRID and AmsterdamUMCdb, meanwhile, are better choices for studies requiring granular and frequent measurements. We suggest MIMIC-IV as a good all-round data set given its higher patient count, good data frequencies and well-established concepts. In chapter 4 we applied some of the considerations discussed in chapters 1 and 2 to a machine learning problem, using a gradient boosted model (GBM) to predict re-admission to the ICU. While we fully agreed with the relevance of the research question and merit of the approach taken, we also critically appraised some of the design choices and questioned if the model would be able to accurately predict re-admission in real-world settings. Regardless of potential modelling and data shortcomings, we argued that more studies using machine learning are required to decrease inter-physician differences in treatment policies and improve clinical care through data. In chapter 5 we presented a clinical use case with application of linear models to real-world data. Using the MIMIC-III data set, trends in survival of cancer patients between 2002-2011 admitted to a single center in Boston, MA were studied. A statistically significant decrease in 28-day mortality rates of cancer patients was found (2002: 33.3% vs. 23.6% in 2011). Chapter 6 explored and analyzed factors that differentiate severely ill sepsis patients with normal versus high serum lactate values. Our analyses showed that high serum bicarbonate and serum chloride levels were associated with normal serum lactate while elevated serum sodium levels, AST and presence of liver disease were strongly associated with high serum lactate levels. These findings were consistent across three data sets from the U.S. and Europe. In chapter 7 a machine learning algorithm was built to select features and predict treatment failure in tuberculosis patients. Using various methods, we identified patient characteristics and clinical features with good classification performance to identify patients at high risk of treatment failure. Linear models performed best, which was slightly higher than models previously established in different countries. Clinical application is, however, limited, as the sensitivity of the models is low (30-36%) when tuned to a specificity of approx. 80%. The relatively low number of treatment failures is likely the main limitation on model performance. In chapter 8 we proved machine learning methods can accurately back-predict pre-admission serum creatinine and hemoglobin values. We developed machine learning algorithms that were trained on patients with outpatient laboratory measurements taken before ICU admission, and hemoglobin and serum creatinine measurements available at admission. Gradient boosted models, random forest models and logistic regression models all achieved good classification performance for all subcohorts (AUC > ~0.7). Gradient boosted models performed best for the cohort with a baseline hemoglobin

Read the last PhD tips