{"title":"Optimal Packing in Simple-Family Codecs","authors":"A. Trotman, Michael H. Albert, Blake Burgess","doi":"10.1145/2808194.2809483","DOIUrl":null,"url":null,"abstract":"The Simple family of codecs is popular for encoding postings lists for a search engine because they are both space effective and time efficient at decoding. These algorithms pack as many integers into a codeword as possible before moving on to the next codeword. This technique is known as left-greedy. This contribution proves that left-greedy is not optimal and then goes on to introduce a dynamic programming solution to find the optimal packing. Experiments on .gov2 and INEX Wikipedia 2009 show that although this is an interesting theoretical result, left-greedy is empirically near optimal in effectiveness and efficiency.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2808194.2809483","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
The Simple family of codecs is popular for encoding postings lists for a search engine because they are both space effective and time efficient at decoding. These algorithms pack as many integers into a codeword as possible before moving on to the next codeword. This technique is known as left-greedy. This contribution proves that left-greedy is not optimal and then goes on to introduce a dynamic programming solution to find the optimal packing. Experiments on .gov2 and INEX Wikipedia 2009 show that although this is an interesting theoretical result, left-greedy is empirically near optimal in effectiveness and efficiency.
Simple系列编解码器在为搜索引擎编码帖子列表时很受欢迎,因为它们在解码时既节省空间又节省时间。这些算法在移动到下一个码字之前,将尽可能多的整数打包到一个码字中。这种技术被称为左贪。这一贡献证明了左贪婪不是最优的,然后引入了一个动态规划解决方案来寻找最优包装。在.gov2和INEX Wikipedia 2009上的实验表明,尽管这是一个有趣的理论结果,但从经验上看,左贪婪在有效性和效率上接近最优。