A Two-Pass Exact Algorithm for Selection on Parallel Disk Systems.

Tian Mi, Sanguthevar Rajasekaran
{"title":"A Two-Pass Exact Algorithm for Selection on Parallel Disk Systems.","authors":"Tian Mi,&nbsp;Sanguthevar Rajasekaran","doi":"10.1109/ISCC.2013.6755015","DOIUrl":null,"url":null,"abstract":"<p><p>Numerous OLAP queries process selection operations of \"top N\", median, \"top 5%\", in data warehousing applications. Selection is a well-studied problem that has numerous applications in the management of data and databases since, typically, any complex data query can be reduced to a series of basic operations such as sorting and selection. The parallel selection has also become an important fundamental operation, especially after parallel databases were introduced. In this paper, we present a deterministic algorithm <i>Recursive Sampling Selection (RSS)</i> to solve the exact out-of-core selection problem, which we show needs no more than (2 + <i>ε</i>) passes (<i>ε</i> being a very small fraction). We have compared our <i>RSS</i> algorithm with two other algorithms in the literature, namely, the <i>Deterministic Sampling Selection</i> and <i>QuickSelect</i> on the Parallel Disks Systems. Our analysis shows that <i>DSS</i> is a (2 + <i>ε</i>)-pass algorithm when the total number of input elements <i>N</i> is a polynomial in the memory size <i>M</i> (i.e., <i>N</i> = <i>M<sup>c</sup></i> for some constant <i>c</i>). While, our proposed algorithm <i>RSS</i> runs in (2 + <i>ε</i>) passes without any assumptions. Experimental results indicate that both <i>RSS</i> and <i>DSS</i> outperform <i>QuickSelect</i> on the Parallel Disks Systems. Especially, the proposed algorithm <i>RSS</i> is more scalable and robust to handle big data when the input size is far greater than the core memory size, including the case of <i>N</i> ≫ <i>M<sup>c</sup></i> .</p>","PeriodicalId":90699,"journal":{"name":"Proceedings. IEEE Symposium on Computers and Communications","volume":"2013 ","pages":"000612-617"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/ISCC.2013.6755015","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE Symposium on Computers and Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCC.2013.6755015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Numerous OLAP queries process selection operations of "top N", median, "top 5%", in data warehousing applications. Selection is a well-studied problem that has numerous applications in the management of data and databases since, typically, any complex data query can be reduced to a series of basic operations such as sorting and selection. The parallel selection has also become an important fundamental operation, especially after parallel databases were introduced. In this paper, we present a deterministic algorithm Recursive Sampling Selection (RSS) to solve the exact out-of-core selection problem, which we show needs no more than (2 + ε) passes (ε being a very small fraction). We have compared our RSS algorithm with two other algorithms in the literature, namely, the Deterministic Sampling Selection and QuickSelect on the Parallel Disks Systems. Our analysis shows that DSS is a (2 + ε)-pass algorithm when the total number of input elements N is a polynomial in the memory size M (i.e., N = Mc for some constant c). While, our proposed algorithm RSS runs in (2 + ε) passes without any assumptions. Experimental results indicate that both RSS and DSS outperform QuickSelect on the Parallel Disks Systems. Especially, the proposed algorithm RSS is more scalable and robust to handle big data when the input size is far greater than the core memory size, including the case of NMc .

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
并行磁盘系统的两步精确选择算法。
在数据仓库应用程序中,许多OLAP查询处理“前N”、中位数、“前5%”的选择操作。选择是一个经过深入研究的问题,在数据和数据库管理中有许多应用,因为通常情况下,任何复杂的数据查询都可以简化为一系列基本操作,例如排序和选择。并行选择也成为一项重要的基础操作,尤其是在并行数据库被引入之后。本文提出了一种确定性算法递归抽样选择(RSS)来解决精确的出核选择问题,我们证明该算法不需要超过(2 + ε)次(ε是一个很小的分数)。我们将RSS算法与文献中的其他两种算法进行了比较,即并行磁盘系统上的确定性抽样选择和快速选择。我们的分析表明,当输入元素总数N是内存大小M的多项式时(即对于某个常数c, N = Mc), DSS是一个(2 + ε)-pass算法。而我们提出的算法RSS运行在(2 + ε) pass时,没有任何假设。实验结果表明,RSS和DSS在并行磁盘系统上的性能都优于QuickSelect。特别是在输入大小远远大于核心内存大小的情况下,包括N比Mc的情况下,RSS算法在处理大数据时具有更强的可扩展性和鲁棒性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
2.10
自引率
0.00%
发文量
0
期刊最新文献
A Two-Pass Exact Algorithm for Selection on Parallel Disk Systems. Network outage impact measures for telecommunications Automatic restoration of telecommunication networks Brazil: the envisaged telecommunication scenario in the near future Impact of network transmission delay and echo on Group 3 facsimile performance
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1