{"title":"Vectorized Bloom filters for advanced SIMD processors","authors":"Orestis Polychroniou, K. A. Ross","doi":"10.1145/2619228.2619234","DOIUrl":null,"url":null,"abstract":"Analytics are at the core of many business intelligence tasks. Efficient query execution is facilitated by advanced hardware features, such as multi-core parallelism, shared-nothing low-latency caches, and SIMD vector instructions. Only recently, the SIMD capabilities of mainstream hardware have been augmented with wider vectors and non-contiguous loads termed gathers. While analytical DBMSs minimize the use of indexes in favor of scans based on sequential memory accesses, some data structures remain crucial. The Bloom filter, one such example, is the most efficient structure for filtering tuples based on their existence in a set and its performance is critical when joining tables with vastly different cardinalities. We introduce a vectorized implementation for probing Bloom filters based on gathers that eliminates conditional control flow and is independent of the SIMD length. Our techniques are generic and can be reused for accelerating other database operations. Our evaluation indicates a significant performance improvement over scalar code that can exceed 3X when the Bloom filter is cache-resident.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"61","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Workshop on Data Management on New Hardware","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2619228.2619234","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 61
Abstract
Analytics are at the core of many business intelligence tasks. Efficient query execution is facilitated by advanced hardware features, such as multi-core parallelism, shared-nothing low-latency caches, and SIMD vector instructions. Only recently, the SIMD capabilities of mainstream hardware have been augmented with wider vectors and non-contiguous loads termed gathers. While analytical DBMSs minimize the use of indexes in favor of scans based on sequential memory accesses, some data structures remain crucial. The Bloom filter, one such example, is the most efficient structure for filtering tuples based on their existence in a set and its performance is critical when joining tables with vastly different cardinalities. We introduce a vectorized implementation for probing Bloom filters based on gathers that eliminates conditional control flow and is independent of the SIMD length. Our techniques are generic and can be reused for accelerating other database operations. Our evaluation indicates a significant performance improvement over scalar code that can exceed 3X when the Bloom filter is cache-resident.