Analysis of Mycobacterium tuberculosis āomics data to inform on loci linked to drug resistance, pathogenicity and virulence
Mycobacterium tuberculosis (Mtb) is the causative agent of human tuberculosis (TB) which remains one of the deadliest pathogens worldwide. The observed genetic diversity among Mtb lineages has been associated with differences in virulence, pathogenicity and drug resistance. However, a better understanding of Mtb strain diversity and its implications for Mtb biology will inform the development of TB control tools, including diagnostics, drugs, and vaccines. Through the application of āomics approaches, this thesis presents a comprehensive analysis of wholeāgenome sequence (WGS) data from Mtb clinical isolates to improve the understanding of the pathogen biology and inform on pathogenicity and drug resistance. The integrated analysis of the genome, transcriptome and methylome of ancient and modern lineages of Mtb revealed genetic variants and methylation patterns with a potential role in gene expression regulation. Through the analysis of the frequency and distribution of mutations associated with resistance to the new antiāTB drugs (bedaquiline, delamanid and pretomanid) in a large data set (ā¼30k isolates), mutations preādating the introduction of these drugs with likely functional effects were observed. This result suggests possible intrinsic or crossāresistance, and potential threats to the effectiveness of MDRāTB treatments. Moreover, by using longāread sequence data, it was possible to characterise the genetic diversity of the 169 pe/ppe genes, which are loci traditionally removed from WGS analysis due to their repetitive GCārich regions. Structural variants in pe/ppe genes with lineageāspecific patterns were found. Finally, with sequencing technologies gaining traction as diagnostic tools, the use of the MinION portable and longāread platform was assessed. The results support its suitability for epidemiological applications and drug resistance detection, with the potential to characterise pe/ppe genes through improved coverage of GCārich regions. Overall, this thesis demonstrates the potential of sequencing platforms to inform TB control and improve the understanding of Mtb biology. The application of different āomics provides with a comprehensive analysis of the different Mtb lineages showing distinct genomic and transcriptomic profiles that translate into different behaviours, with diagnostic and treatment implications.