Bulk sorted access for efficient top-k retrieval

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management Pub Date : 2013-07-29 DOI:10.1145/2484838.2484852

Dustin Lange, Felix Naumann

{"title":"Bulk sorted access for efficient top-k retrieval","authors":"Dustin Lange, Felix Naumann","doi":"10.1145/2484838.2484852","DOIUrl":null,"url":null,"abstract":"Efficient top-k retrieval of records from a database has been an active research field for many years. We approach the problem from a real-world application point of view, in which the order of records according to some similarity function on an attribute is not unique: Many records have same values in several attributes and thus their ranking in those attributes is arbitrary. For instance, in large person databases many individuals have the same first name, the same date of birth, or live in the same city. Existing algorithms, such as the Threshold Algorithm (TA), are ill-equipped to handle such cases efficiently.\n We introduce a variation of TA, the Bulk Sorted Access Algorithm (BSA), which retrieves larger chunks of records from the sorted lists using fixed thresholds, and which focusses its efforts on records that are ranked high in more than one ordering and are thus more promising candidates. We experimentally show that our method outperforms TA and another previous method for top-k retrieval in those very common cases.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"199 1","pages":"39:1-39:4"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2484838.2484852","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Efficient top-k retrieval of records from a database has been an active research field for many years. We approach the problem from a real-world application point of view, in which the order of records according to some similarity function on an attribute is not unique: Many records have same values in several attributes and thus their ranking in those attributes is arbitrary. For instance, in large person databases many individuals have the same first name, the same date of birth, or live in the same city. Existing algorithms, such as the Threshold Algorithm (TA), are ill-equipped to handle such cases efficiently. We introduce a variation of TA, the Bulk Sorted Access Algorithm (BSA), which retrieves larger chunks of records from the sorted lists using fixed thresholds, and which focusses its efforts on records that are ranked high in more than one ordering and are thus more promising candidates. We experimentally show that our method outperforms TA and another previous method for top-k retrieval in those very common cases.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

批量排序访问，以实现高效的top-k检索

多年来，数据库中记录的高效top-k检索一直是一个活跃的研究领域。我们从实际应用程序的角度来处理这个问题，其中根据属性上的某些相似性函数的记录顺序不是唯一的:许多记录在几个属性中具有相同的值，因此它们在这些属性中的排名是任意的。例如，在大型人员数据库中，许多人有相同的名字、相同的出生日期或住在同一个城市。现有的算法，如阈值算法(TA)，无法有效地处理这类情况。我们介绍了TA的一种变体，即批量排序访问算法(BSA)，它使用固定阈值从排序列表中检索更大的记录块，并将其工作重点放在在多个排序中排名较高的记录上，因此更有希望的候选记录。我们通过实验证明，在这些非常常见的情况下，我们的方法优于TA和另一种以前的top-k检索方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management

自引率

0.00%

发文量

期刊最新文献

Towards Co-Evolution of Data-Centric Ecosystems. Data perturbation for outlier detection ensembles SLACID - sparse linear algebra in a column-oriented in-memory database system SensorBench: benchmarking approaches to processing wireless sensor network data Efficient data management and statistics with zero-copy integration