Chenxing Li, Zhen Chen, Wenxun Zheng, Yinjun Wu, Junwei Cao
{"title":"BAH: A Bitmap Index Compression Algorithm for Fast Data Retrieval","authors":"Chenxing Li, Zhen Chen, Wenxun Zheng, Yinjun Wu, Junwei Cao","doi":"10.1109/LCN.2016.120","DOIUrl":null,"url":null,"abstract":"Efficient retrieval of traffic archival data is a must-have technique to detect network attacks, such as APT(advanced persistent threat) attack. In order to take insight from Internet traffic, the bitmap index is increasingly used for efficiently querying over large datasets. However, a raw bitmap index leads to high space consumption and overhead on loading indexes. Various bitmap index compression algorithms are proposed to save storage while improving query efficiency. This paper proposes a new bitmap index compression algorithm called BAH (Byte Aligned Hybrid compression coding). An acceleration algorithm using SIMD is designed to increase the efficiency of AND operation over multiple compressed bitmaps. In all, BAH has a better compression ratio and faster intersection querying speed compared with several previous works such as WAH, PLWAH, COMPAX, Roaring etc. The theoretical analysis shows that the space required by BAH is no larger than 1.6 times the information entropy of the bitmap with density larger than 0.2%. In the experiments, BAH saves about 65% space and 60% space compared with WAH on two datasets. The experiments also demonstrate the query efficiency of BAH with the application in Internet Traffic and Web pages.","PeriodicalId":6864,"journal":{"name":"2016 IEEE 41st Conference on Local Computer Networks (LCN)","volume":"40 1","pages":"697-705"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE 41st Conference on Local Computer Networks (LCN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/LCN.2016.120","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Efficient retrieval of traffic archival data is a must-have technique to detect network attacks, such as APT(advanced persistent threat) attack. In order to take insight from Internet traffic, the bitmap index is increasingly used for efficiently querying over large datasets. However, a raw bitmap index leads to high space consumption and overhead on loading indexes. Various bitmap index compression algorithms are proposed to save storage while improving query efficiency. This paper proposes a new bitmap index compression algorithm called BAH (Byte Aligned Hybrid compression coding). An acceleration algorithm using SIMD is designed to increase the efficiency of AND operation over multiple compressed bitmaps. In all, BAH has a better compression ratio and faster intersection querying speed compared with several previous works such as WAH, PLWAH, COMPAX, Roaring etc. The theoretical analysis shows that the space required by BAH is no larger than 1.6 times the information entropy of the bitmap with density larger than 0.2%. In the experiments, BAH saves about 65% space and 60% space compared with WAH on two datasets. The experiments also demonstrate the query efficiency of BAH with the application in Internet Traffic and Web pages.