{"title":"Bulk sorted access for efficient top-k retrieval","authors":"Dustin Lange, Felix Naumann","doi":"10.1145/2484838.2484852","DOIUrl":null,"url":null,"abstract":"Efficient top-k retrieval of records from a database has been an active research field for many years. We approach the problem from a real-world application point of view, in which the order of records according to some similarity function on an attribute is not unique: Many records have same values in several attributes and thus their ranking in those attributes is arbitrary. For instance, in large person databases many individuals have the same first name, the same date of birth, or live in the same city. Existing algorithms, such as the Threshold Algorithm (TA), are ill-equipped to handle such cases efficiently.\n We introduce a variation of TA, the Bulk Sorted Access Algorithm (BSA), which retrieves larger chunks of records from the sorted lists using fixed thresholds, and which focusses its efforts on records that are ranked high in more than one ordering and are thus more promising candidates. We experimentally show that our method outperforms TA and another previous method for top-k retrieval in those very common cases.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"199 1","pages":"39:1-39:4"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2484838.2484852","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Efficient top-k retrieval of records from a database has been an active research field for many years. We approach the problem from a real-world application point of view, in which the order of records according to some similarity function on an attribute is not unique: Many records have same values in several attributes and thus their ranking in those attributes is arbitrary. For instance, in large person databases many individuals have the same first name, the same date of birth, or live in the same city. Existing algorithms, such as the Threshold Algorithm (TA), are ill-equipped to handle such cases efficiently. We introduce a variation of TA, the Bulk Sorted Access Algorithm (BSA), which retrieves larger chunks of records from the sorted lists using fixed thresholds, and which focusses its efforts on records that are ranked high in more than one ordering and are thus more promising candidates. We experimentally show that our method outperforms TA and another previous method for top-k retrieval in those very common cases.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
批量排序访问,以实现高效的top-k检索
多年来,数据库中记录的高效top-k检索一直是一个活跃的研究领域。我们从实际应用程序的角度来处理这个问题,其中根据属性上的某些相似性函数的记录顺序不是唯一的:许多记录在几个属性中具有相同的值,因此它们在这些属性中的排名是任意的。例如,在大型人员数据库中,许多人有相同的名字、相同的出生日期或住在同一个城市。现有的算法,如阈值算法(TA),无法有效地处理这类情况。我们介绍了TA的一种变体,即批量排序访问算法(BSA),它使用固定阈值从排序列表中检索更大的记录块,并将其工作重点放在在多个排序中排名较高的记录上,因此更有希望的候选记录。我们通过实验证明,在这些非常常见的情况下,我们的方法优于TA和另一种以前的top-k检索方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Towards Co-Evolution of Data-Centric Ecosystems. Data perturbation for outlier detection ensembles SLACID - sparse linear algebra in a column-oriented in-memory database system SensorBench: benchmarking approaches to processing wireless sensor network data Efficient data management and statistics with zero-copy integration
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1