Cheminformatics Tools to Explore the Chemical Space of Peptides and Natural Products - PhDData

Access database of worldwide thesis

Cheminformatics Tools to Explore the Chemical Space of Peptides and Natural Products

The thesis was published by Capecchi, Alice, in September 2022, University of Bern.


Cheminformatics facilitates the analysis, storage, and collection of large quantities of chemical data, such as molecular structures and molecules’ properties and biological activity, and it has revolutionized medicinal chemistry for small molecules. However, its application to larger molecules is still underrepresented. This thesis work attempts to fill this gap and extend the cheminformatics approach towards large molecules and peptides.

This thesis is divided into two parts. The first part presents the implementation and application of two new molecular descriptors: macromolecule extended atom pair fingerprint (MXFP) and MinHashed atom pair fingerprint of radius 2 (MAP4). MXFP is an atom pair fingerprint suitable for large molecules, and here, it is used to explore the chemical space of non-Lipinski molecules within the widely used PubChem and ChEMBL databases. MAP4 is a MinHashed hybrid of substructure and atom pair fingerprints suitable for encoding small and large molecules. MAP4 is first benchmarked against commonly used atom pairs and substructure fingerprints, and then it is used to investigate the chemical space of microbial and plants natural products with the aid of machine learning and chemical space mapping.

The second part of the thesis focuses on peptides, and it is introduced by a review chapter on approaches to discover novel peptide structures and describing the known peptide chemical space. Then, a genetic algorithm that uses MXFP in its fitness function is described and challenged to generate peptide analogs of peptidic or non-peptidic queries. Finally, supervised and unsupervised machine learning is used to generate novel antimicrobial and non-hemolytic peptide sequences.

The full thesis can be downloaded at :

Read the last PhD tips