{"title":"A Forest-structured Bloom Filter with flash memory","authors":"Guanlin Lu, Biplob K. Debnath, D. Du","doi":"10.1109/MSST.2011.5937232","DOIUrl":null,"url":null,"abstract":"A Bloom Filter (BF) is a data structure based on probability to compactly represent/record a set of elements (keys). It has wide applications on efficiently identifying a key that has been seen before with minimum amount of recording space used. BF is heavily used in chunking based data de-duplication. Traditionally, a BF is implemented as in-RAM data structure; hence its size is limited by the available RAM space on the machine. For certain applications like data de-duplication that require a big BF beyond the size of available RAM space, it becomes necessary to store a BF into a secondary storage device. Since BF operations are inherently random in nature, magnetic disk provides worse performance for the random read and write operations. It will not be a good fit for storing the large BF. Flash memory based Solid State Drive (SSD) has been considered as an emerging storage device that has superior performance and can potentially replace disks as the preferred secondary storage devices. However, several special characteristics of flash memory make designing a flash memory based BF very challenging. In this paper, our goal is to design an efficient flash memory based BF that is fully aware of these physical characteristics. To this end, we propose a Forest-structured BF design (FBF). FBF uses a combination of RAM and flash memory to design a BF. BF is stored on the flash, while RAM helps to mitigate the impact of slow write performance of flash memory. In addition, in-flash BF is organized in a forest-like structure in order to improve the lookup performance. Our experimental results show that FBF design achieves 2 times faster processing speed with 50% less number of flash write operations when compared with the existing flash memory based BF designs.","PeriodicalId":136636,"journal":{"name":"2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MSST.2011.5937232","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 26
Abstract
A Bloom Filter (BF) is a data structure based on probability to compactly represent/record a set of elements (keys). It has wide applications on efficiently identifying a key that has been seen before with minimum amount of recording space used. BF is heavily used in chunking based data de-duplication. Traditionally, a BF is implemented as in-RAM data structure; hence its size is limited by the available RAM space on the machine. For certain applications like data de-duplication that require a big BF beyond the size of available RAM space, it becomes necessary to store a BF into a secondary storage device. Since BF operations are inherently random in nature, magnetic disk provides worse performance for the random read and write operations. It will not be a good fit for storing the large BF. Flash memory based Solid State Drive (SSD) has been considered as an emerging storage device that has superior performance and can potentially replace disks as the preferred secondary storage devices. However, several special characteristics of flash memory make designing a flash memory based BF very challenging. In this paper, our goal is to design an efficient flash memory based BF that is fully aware of these physical characteristics. To this end, we propose a Forest-structured BF design (FBF). FBF uses a combination of RAM and flash memory to design a BF. BF is stored on the flash, while RAM helps to mitigate the impact of slow write performance of flash memory. In addition, in-flash BF is organized in a forest-like structure in order to improve the lookup performance. Our experimental results show that FBF design achieves 2 times faster processing speed with 50% less number of flash write operations when compared with the existing flash memory based BF designs.
布隆过滤器(BF)是一种基于概率的数据结构,用于紧凑地表示/记录一组元素(键)。它在以最小的记录空间有效地识别以前见过的密钥方面具有广泛的应用。BF在基于分块的重复数据删除中得到了广泛的应用。传统上,BF是作为内存中的数据结构实现的;因此,它的大小受到机器上可用RAM空间的限制。对于某些应用程序,如数据重复删除,需要一个超过可用RAM空间大小的大BF,就有必要将BF存储到辅助存储设备中。由于高炉操作本身具有随机性,因此磁盘对随机读写操作的性能较差。它不适合储存大型高炉。基于闪存的固态硬盘(Solid State Drive, SSD)被认为是一种新兴的存储设备,具有优越的性能,有可能取代磁盘成为首选的二级存储设备。然而,闪存的一些特殊特性使得基于BF的闪存的设计非常具有挑战性。在本文中,我们的目标是设计一个高效的基于BF的闪存,充分意识到这些物理特性。为此,我们提出了一种森林结构BF设计(FBF)。FBF采用RAM和闪存相结合的方式来设计BF。BF存储在闪存上,而RAM有助于减轻闪存写入速度慢的影响。此外,为了提高查找性能,flash内BF被组织成类似森林的结构。实验结果表明,与现有基于闪存的BF设计相比,FBF设计的处理速度提高了2倍,闪存写入操作次数减少了50%。