Assessing splicing accuracy and its determinants across human tissues using RNA-sequencing
Alternative splicing (AS) is a feature of most multi-exonic human genes and its dysregulation is known to contribute to ageing and disease. However, the genome-wide accuracy of AS has received less attention. This is despite the fact that high-depth RNA-sequencing data of non-diseased human samples commonly contains partially unannotated reads detected at low frequency across samples and within an individual, implying the existence of splicing inaccuracies.
In this thesis, I investigated the characteristics and drivers of inaccurate splicing, namely mis-splicing activity, which can be detected using split reads partially mapping to known transcripts in the reference annotation. Using short-read RNA-sequencing data derived from ∼14K samples and 42 human tissues provided by the Genotype-Tissue Expression Consortium v8, I developed IntroVerse, a relational database on the splicing of >300K annotated introns and a linked set of >4.5m novel junctions covering ∼32K genes.
By using a subset of the data stored on IntroVerse, I found that mis-splicing has a distribution pattern, and is generated at different rates across introns and tissues. Using linear regression models to predict mis-splicing, I found that invariable intronic properties such as inter-species sequence conservation neighbouring the 5’/3’ splice sites, had the highest variability in predictive value for mis-splicing across tissues and that these tissue differences were unlikely to be driven by germline or somatic mutations.
Using RNA-sequencing data after in vitro knockdowns of multiple RNA-binding proteins, I demonstrated significant changes in the distribution of mis-splicing, showing that accurate splice site selection is affected by RBP expression levels. I also found that mis-splicing tends to increase with age in most tissues, and particularly affects genes implicated in neurodegenerative diseases.
I anticipate that this in-depth characterization of mis-splicing will help improve our understanding of the role of mis-splicing in human disease and age-related disorders.
https://discovery.ucl.ac.uk/id/eprint/10179390/2/Sonia_Garcia_Ruiz_PhD_Thesis.pdf