Machine learning applications in fuels research
Energy and climate crises both require engineering solutions to the dependence on fossil fuels.
Bio-fuels can be a good alternative, because these fuels do not require significant change in
the automotive design and can be used in poorer countries, where electricity production is
fossil fuels based and electrical vehicles are not yet affordable.
This research was done to demonstrate applications of machine learning and chemical
informatics in fuel research in order to accelerate the development of the future renewable
fuels. At the moment these applications are limited to regression models of fuels physical
and chemical properties, which are tested and compared on different sets of test molecules.
This is contradictory to chemical informatics research, where models compared use the same
train and test sets of molecules.
Three major research directions were undertaken including interpretation of octane
quantitative structure activity model, establishing fuel properties applicability domain and the
inverse structure generative model. The aim of the first direction is to show that more insights
apart from predictions can be taken from the fuel structure activity models. Literature review
was taken to find suitable interpretation method. It was found that model and input agnostic
method is best suited since it does not introduce bias from particular algorithm type and is
chemically intuitive. The developed interpretation showed agreement with 6 known structure
knock ignition relationships.
The goal of the second direction goal was to find limitations on what fuel structure
properties can be predicted given the available data-sets. This was achieved by finding a
relationship between test compounds properties prediction errors and four similarity measures
to the training set of compounds. The outlier detection algorithms successfully found such a
trend, while the distance based approaches did not.
Lastly, the generative modelling goal was to suggest new fuel like structures, which
can not be found by intuition. Two models were investigated, a general adversarial (GAN)
network and auto-regressive LSTM model. The later model generated structures which were
not present in either of the generative modelling training set or the octane database, and
showed some traits of highly knock resistant compounds.
https://discovery.ucl.ac.uk/id/eprint/10181830/1/Thesis_version2