Computational analyses to characterise hidden information in short and long read sequencing data of human genomes: there’s more than meets the reference - PhDData

Access database of worldwide thesis




Computational analyses to characterise hidden information in short and long read sequencing data of human genomes: there’s more than meets the reference

The thesis was published by Linthorst, Jasper, in December 2022, VU University Amsterdam.

Abstract:

Next generation sequencing (NGS) has enabled us to accurately determine the nucleotide sequence of short fragments of DNA at a massive scale, which has led to various clinical applications of human genome sequencing. To extract information from these NGS experiments, virtually all analyses make use of a reference assembly of the human genome to map sequenced reads. Importantly, in these experiments a large fraction (~12%) of the sequenced DNA fragments are ignored as the origin of these sequences cannot be traced back to a (single) position on the reference assembly. The origin of these ignored or unmapped fragments is dual. On the one hand these fragments originate from sequence that occurs more than once (repeats). On the other hand, these fragments originate from sequence that is absent from the reference assembly. In practice, many of these unmapped fragments originate from so-called structural variations (SVs) where the sequenced genome differs from the reference assembly. In Part 1 of this thesis, we study this source of sequence variation by making use of so-called long-read sequencing technology and introduce methods to do so. In Part 2 of this thesis, we specifically study the DNA fragments that can’t be traced back to the human reference assembly, but instead seem to originate from DNA viruses.



Read the last PhD tips