Exact and Approximate Range Mode Query Data Structures in Practice

Meng He, Zhen Liu
{"title":"Exact and Approximate Range Mode Query Data Structures in Practice","authors":"Meng He, Zhen Liu","doi":"10.4230/LIPIcs.SEA.2023.19","DOIUrl":null,"url":null,"abstract":"We conduct an experimental study on the range mode problem. In the exact version of the problem, we preprocess an array A , such that given a query range [ a, b ], the most frequent element in A [ a, b ] can be found efficiently. For this problem, our most important finding is that the strategy of using succinct data structures to encode more precomputed information not only helped Chan et al. (Linear-space data structures for range mode query in arrays, Theory of Computing Systems, 2013) improve previous results in theory but also helps us achieve the best time/space tradeoff in practice; we even go a step further to replace more components in their solution with succinct data structures and improve the performance further. In the approximate version of this problem, a (1 + ε )-approximate range mode query looks for an element whose occurrences in A [ a, b ] is at least F a,b / (1 + ε ), where F a,b is the frequency of the mode in A [ a, b ]. We implement all previous solutions to this problems and find that, even when ε = 1 2 , the average approximation ratio of these solutions is close to 1 in practice, and they provide much faster query time than the best exact solution. These solutions achieve different useful time-space tradeoffs, and among them, El-Zein et al. (On Approximate Range Mode and Range Selection, 30th International Symposium on Algorithms and Computation, 2019) provide us with one solution whose space usage is only 35 . 6% to 93 . 8% of the cost of storing the input array of 32-bit integers (in most cases, the space cost is closer to the lower end, and the average space cost is 20.2 bits per symbol among all datasets). Its non-succinct version also stands out with query support at least several times faster than other O ( nε )-word structures while using only slightly more space in practice.","PeriodicalId":9448,"journal":{"name":"Bulletin of the Society of Sea Water Science, Japan","volume":"165 1","pages":"19:1-19:22"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bulletin of the Society of Sea Water Science, Japan","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4230/LIPIcs.SEA.2023.19","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

We conduct an experimental study on the range mode problem. In the exact version of the problem, we preprocess an array A , such that given a query range [ a, b ], the most frequent element in A [ a, b ] can be found efficiently. For this problem, our most important finding is that the strategy of using succinct data structures to encode more precomputed information not only helped Chan et al. (Linear-space data structures for range mode query in arrays, Theory of Computing Systems, 2013) improve previous results in theory but also helps us achieve the best time/space tradeoff in practice; we even go a step further to replace more components in their solution with succinct data structures and improve the performance further. In the approximate version of this problem, a (1 + ε )-approximate range mode query looks for an element whose occurrences in A [ a, b ] is at least F a,b / (1 + ε ), where F a,b is the frequency of the mode in A [ a, b ]. We implement all previous solutions to this problems and find that, even when ε = 1 2 , the average approximation ratio of these solutions is close to 1 in practice, and they provide much faster query time than the best exact solution. These solutions achieve different useful time-space tradeoffs, and among them, El-Zein et al. (On Approximate Range Mode and Range Selection, 30th International Symposium on Algorithms and Computation, 2019) provide us with one solution whose space usage is only 35 . 6% to 93 . 8% of the cost of storing the input array of 32-bit integers (in most cases, the space cost is closer to the lower end, and the average space cost is 20.2 bits per symbol among all datasets). Its non-succinct version also stands out with query support at least several times faster than other O ( nε )-word structures while using only slightly more space in practice.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
精确和近似范围模式查询数据结构的实践
我们对距离模式问题进行了实验研究。在这个问题的确切版本中,我们预处理一个数组A,这样给定一个查询范围[A, b],可以有效地找到A [A, b]中最频繁的元素。对于这个问题,我们最重要的发现是,使用简洁的数据结构来编码更多预先计算的信息的策略不仅有助于Chan等人(数组中范围模式查询的线性空间数据结构,Theory of Computing Systems, 2013)在理论上改善了以前的结果,而且还帮助我们在实践中实现了最佳的时间/空间权衡;我们甚至更进一步,用简洁的数据结构替换他们解决方案中的更多组件,并进一步提高性能。在这个问题的近似版本中,a (1 + ε)-近似范围模式查询查找在a [a,b]中出现的元素至少是F a,b / (1 + ε),其中F a,b是a [a,b]中模式的频率。我们实现了该问题之前的所有解,发现即使当ε = 1 2时,这些解的平均近似比在实践中也接近于1,并且它们提供的查询时间比最佳精确解快得多。这些解决方案实现了不同的有用的时空权衡,其中El-Zein等人(On Approximate Range Mode and Range Selection,第30届国际算法与计算研讨会,2019)为我们提供了一个空间利用率仅为35的解决方案。6%到93。存储32位整数输入数组的成本的8%(在大多数情况下,空间成本更接近下限,在所有数据集中,平均空间成本为每个符号20.2位)。它的非简洁版本也很突出,它的查询支持速度至少比其他0 (nε)字结构快几倍,而在实践中只使用了稍微多一点的空间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Efficient Yao Graph Construction Partitioning the Bags of a Tree Decomposition Into Cliques Arc-Flags Meet Trip-Based Public Transit Routing Maximum Coverage in Sublinear Space, Faster FREIGHT: Fast Streaming Hypergraph Partitioning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1