高度相关文献检索评价指标的性质研究

T. Sakai
{"title":"高度相关文献检索评价指标的性质研究","authors":"T. Sakai","doi":"10.2197/IPSJDC.3.643","DOIUrl":null,"url":null,"abstract":"Traditional information retrieval evaluation relies on both precision and recall. However, modern search environments such as the Web, in which recall is either unimportant or immeasurable, require precision-oriented evaluation. In particular, finding one highly relevant document is very important for practical tasks such as known-item search and suspected-item search. This paper compares the properties of five evaluation metrics that are applicable to the task of finding one highly relevant document in terms of the underlying assumptions, how the system rankings produced resemble each other, and discriminative power. We employ two existing methods for comparing the discriminative power of these metrics: The Swap Method proposed by Voorhees and Buckley at ACM SIGIR 2002, and the Bootstrap Sensitivity Method proposed by Sakai at SIGIR 2006. We use four data sets from NTCIR to show that, while P(+)-measure, O-measure and NWRR (Normalised Weighted Reciprocal Rank)are reasonably highly correlated to one another, P(+)-measure and O-measure are more discriminative than NWRR, which in turn is more discriminative than Reciprocal Rank. We therefore conclude that P(+)-measure and O-measure, each modelling a different user behaviour, are the most useful evaluation metrics for the task of finding one highly relevant document.","PeriodicalId":432390,"journal":{"name":"Ipsj Digital Courier","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"On the Properties of Evaluation Metrics for Finding One Highly Relevant Document\",\"authors\":\"T. Sakai\",\"doi\":\"10.2197/IPSJDC.3.643\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Traditional information retrieval evaluation relies on both precision and recall. However, modern search environments such as the Web, in which recall is either unimportant or immeasurable, require precision-oriented evaluation. In particular, finding one highly relevant document is very important for practical tasks such as known-item search and suspected-item search. This paper compares the properties of five evaluation metrics that are applicable to the task of finding one highly relevant document in terms of the underlying assumptions, how the system rankings produced resemble each other, and discriminative power. We employ two existing methods for comparing the discriminative power of these metrics: The Swap Method proposed by Voorhees and Buckley at ACM SIGIR 2002, and the Bootstrap Sensitivity Method proposed by Sakai at SIGIR 2006. We use four data sets from NTCIR to show that, while P(+)-measure, O-measure and NWRR (Normalised Weighted Reciprocal Rank)are reasonably highly correlated to one another, P(+)-measure and O-measure are more discriminative than NWRR, which in turn is more discriminative than Reciprocal Rank. We therefore conclude that P(+)-measure and O-measure, each modelling a different user behaviour, are the most useful evaluation metrics for the task of finding one highly relevant document.\",\"PeriodicalId\":432390,\"journal\":{\"name\":\"Ipsj Digital Courier\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-09-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ipsj Digital Courier\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2197/IPSJDC.3.643\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ipsj Digital Courier","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2197/IPSJDC.3.643","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

摘要

传统的信息检索评价既依赖于准确率,又依赖于召回率。然而,像Web这样的现代搜索环境中,召回率要么不重要,要么无法测量,因此需要以精确度为导向的评估。特别是,查找高度相关的文档对于诸如已知项搜索和可疑项搜索等实际任务非常重要。本文比较了五个评估指标的属性,这些指标适用于寻找一个高度相关的文件的任务,包括潜在的假设,系统排名如何产生彼此相似,以及判别能力。我们采用了两种现有的方法来比较这些指标的判别能力:Voorhees和Buckley在ACM SIGIR 2002上提出的Swap方法,以及Sakai在SIGIR 2006上提出的Bootstrap灵敏度方法。我们使用来自NTCIR的四个数据集来表明,虽然P(+)-测度、o -测度和NWRR(归一化加权倒数秩)彼此之间具有相当高的相关性,但P(+)-测度和o -测度比NWRR更具判别性,而NWRR又比倒数秩更具判别性。因此,我们得出结论,P(+)-度量和o -度量(每个度量都模拟不同的用户行为)是寻找高度相关文档的任务中最有用的评估度量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
On the Properties of Evaluation Metrics for Finding One Highly Relevant Document
Traditional information retrieval evaluation relies on both precision and recall. However, modern search environments such as the Web, in which recall is either unimportant or immeasurable, require precision-oriented evaluation. In particular, finding one highly relevant document is very important for practical tasks such as known-item search and suspected-item search. This paper compares the properties of five evaluation metrics that are applicable to the task of finding one highly relevant document in terms of the underlying assumptions, how the system rankings produced resemble each other, and discriminative power. We employ two existing methods for comparing the discriminative power of these metrics: The Swap Method proposed by Voorhees and Buckley at ACM SIGIR 2002, and the Bootstrap Sensitivity Method proposed by Sakai at SIGIR 2006. We use four data sets from NTCIR to show that, while P(+)-measure, O-measure and NWRR (Normalised Weighted Reciprocal Rank)are reasonably highly correlated to one another, P(+)-measure and O-measure are more discriminative than NWRR, which in turn is more discriminative than Reciprocal Rank. We therefore conclude that P(+)-measure and O-measure, each modelling a different user behaviour, are the most useful evaluation metrics for the task of finding one highly relevant document.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Distributed-Processing System for Accelerating Biological Research Using Data-Staging A Type System for Dynamic Delimited Continuations A Combination Method of the Tanimoto Coefficient and Proximity Measure of Random Forest for Compound Activity Prediction Peer-to-Peer Multimedia Streaming with Guaranteed QoS for Future Real-time Applications A Benchmark Tool for Network I/O Management Architectures
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1