{"title":"A Two-Pass Exact Algorithm for Selection on Parallel Disk Systems.","authors":"Tian Mi, Sanguthevar Rajasekaran","doi":"10.1109/ISCC.2013.6755015","DOIUrl":null,"url":null,"abstract":"<p><p>Numerous OLAP queries process selection operations of \"top N\", median, \"top 5%\", in data warehousing applications. Selection is a well-studied problem that has numerous applications in the management of data and databases since, typically, any complex data query can be reduced to a series of basic operations such as sorting and selection. The parallel selection has also become an important fundamental operation, especially after parallel databases were introduced. In this paper, we present a deterministic algorithm <i>Recursive Sampling Selection (RSS)</i> to solve the exact out-of-core selection problem, which we show needs no more than (2 + <i>ε</i>) passes (<i>ε</i> being a very small fraction). We have compared our <i>RSS</i> algorithm with two other algorithms in the literature, namely, the <i>Deterministic Sampling Selection</i> and <i>QuickSelect</i> on the Parallel Disks Systems. Our analysis shows that <i>DSS</i> is a (2 + <i>ε</i>)-pass algorithm when the total number of input elements <i>N</i> is a polynomial in the memory size <i>M</i> (i.e., <i>N</i> = <i>M<sup>c</sup></i> for some constant <i>c</i>). While, our proposed algorithm <i>RSS</i> runs in (2 + <i>ε</i>) passes without any assumptions. Experimental results indicate that both <i>RSS</i> and <i>DSS</i> outperform <i>QuickSelect</i> on the Parallel Disks Systems. Especially, the proposed algorithm <i>RSS</i> is more scalable and robust to handle big data when the input size is far greater than the core memory size, including the case of <i>N</i> ≫ <i>M<sup>c</sup></i> .</p>","PeriodicalId":90699,"journal":{"name":"Proceedings. IEEE Symposium on Computers and Communications","volume":"2013 ","pages":"000612-617"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/ISCC.2013.6755015","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE Symposium on Computers and Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCC.2013.6755015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Numerous OLAP queries process selection operations of "top N", median, "top 5%", in data warehousing applications. Selection is a well-studied problem that has numerous applications in the management of data and databases since, typically, any complex data query can be reduced to a series of basic operations such as sorting and selection. The parallel selection has also become an important fundamental operation, especially after parallel databases were introduced. In this paper, we present a deterministic algorithm Recursive Sampling Selection (RSS) to solve the exact out-of-core selection problem, which we show needs no more than (2 + ε) passes (ε being a very small fraction). We have compared our RSS algorithm with two other algorithms in the literature, namely, the Deterministic Sampling Selection and QuickSelect on the Parallel Disks Systems. Our analysis shows that DSS is a (2 + ε)-pass algorithm when the total number of input elements N is a polynomial in the memory size M (i.e., N = Mc for some constant c). While, our proposed algorithm RSS runs in (2 + ε) passes without any assumptions. Experimental results indicate that both RSS and DSS outperform QuickSelect on the Parallel Disks Systems. Especially, the proposed algorithm RSS is more scalable and robust to handle big data when the input size is far greater than the core memory size, including the case of N ≫ Mc .