Jena Bloom: Query Execution with Bloom Filter Existence Check
In recent years, graph data management, triplestores, and knowledge graphs have increasingly have attracted interest. However, it still remains challenging to efficiently query triplestores, as many optimization strategies from traditional databases are still left unexplored. As a first step to optimize triplestores, this paper examines the question of how to improve query execution time by addressing costly existence checks in join operations. To achieve this goal, we integrate a Bloom filter residing entirely and compactly in-memory to be used in place of disk-based indexes for existence check operations. We furthermore apply triple statistics in determining the specific join operations in which Bloom filter existence checks benefit execution time. We extend a reference triplestore (Jena) with Bloom Filters and integrate our approach for query optimization. We evaluate our approach, JenaBloom, on a large set of more than 1,500 queries, and show its effectiveness on queries returning empty result sets, as well as those returning non-empty result sets.
https://vbn.aau.dk/ws/files/546991110/Jena_Bloom__Master_thesis_.pdf