鼓励信息检索方法创新和多样化的新指标

Mehmet Deniz Turkmen, Matthew Lease, Mucahid Kutlu
{"title":"鼓励信息检索方法创新和多样化的新指标","authors":"Mehmet Deniz Turkmen, Matthew Lease, Mucahid Kutlu","doi":"10.48550/arXiv.2301.08062","DOIUrl":null,"url":null,"abstract":"In evaluation campaigns, participants often explore variations of popular, state-of-the-art baselines as a low-risk strategy to achieve competitive results. While effective, this can lead to local\"hill climbing\"rather than more radical and innovative departure from standard methods. Moreover, if many participants build on similar baselines, the overall diversity of approaches considered may be limited. In this work, we propose a new class of IR evaluation metrics intended to promote greater diversity of approaches in evaluation campaigns. Whereas traditional IR metrics focus on user experience, our two\"innovation\"metrics instead reward exploration of more divergent, higher-risk strategies finding relevant documents missed by other systems. Experiments on four TREC collections show that our metrics do change system rankings by rewarding systems that find such rare, relevant documents. This result is further supported by a controlled, synthetic data experiment, and a qualitative analysis. In addition, we show that our metrics achieve higher evaluation stability and discriminative power than the standard metrics we modify. To support reproducibility, we share our source code.","PeriodicalId":126309,"journal":{"name":"European Conference on Information Retrieval","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"New Metrics to Encourage Innovation and Diversity in Information Retrieval Approaches\",\"authors\":\"Mehmet Deniz Turkmen, Matthew Lease, Mucahid Kutlu\",\"doi\":\"10.48550/arXiv.2301.08062\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In evaluation campaigns, participants often explore variations of popular, state-of-the-art baselines as a low-risk strategy to achieve competitive results. While effective, this can lead to local\\\"hill climbing\\\"rather than more radical and innovative departure from standard methods. Moreover, if many participants build on similar baselines, the overall diversity of approaches considered may be limited. In this work, we propose a new class of IR evaluation metrics intended to promote greater diversity of approaches in evaluation campaigns. Whereas traditional IR metrics focus on user experience, our two\\\"innovation\\\"metrics instead reward exploration of more divergent, higher-risk strategies finding relevant documents missed by other systems. Experiments on four TREC collections show that our metrics do change system rankings by rewarding systems that find such rare, relevant documents. This result is further supported by a controlled, synthetic data experiment, and a qualitative analysis. In addition, we show that our metrics achieve higher evaluation stability and discriminative power than the standard metrics we modify. To support reproducibility, we share our source code.\",\"PeriodicalId\":126309,\"journal\":{\"name\":\"European Conference on Information Retrieval\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"European Conference on Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2301.08062\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Conference on Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2301.08062","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在评估活动中,参与者经常探索流行的、最先进的基线的变化,作为实现竞争结果的低风险策略。虽然有效,但这可能导致局部的“攀高”,而不是对标准方法的更激进和创新的偏离。此外,如果许多参与者建立在类似的基线上,所考虑的方法的总体多样性可能会受到限制。在这项工作中,我们提出了一类新的IR评估指标,旨在促进评估活动中方法的更大多样性。传统的IR指标侧重于用户体验,而我们的两个“创新”指标则奖励探索更具发散性、风险更高的策略,以发现其他系统遗漏的相关文档。在四个TREC集合上的实验表明,我们的指标确实通过奖励找到这些罕见的相关文档的系统来改变系统排名。这一结果进一步得到了控制,综合数据实验和定性分析的支持。此外,我们证明了我们的指标比我们修改的标准指标具有更高的评估稳定性和判别能力。为了支持再现性,我们共享了源代码。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
New Metrics to Encourage Innovation and Diversity in Information Retrieval Approaches
In evaluation campaigns, participants often explore variations of popular, state-of-the-art baselines as a low-risk strategy to achieve competitive results. While effective, this can lead to local"hill climbing"rather than more radical and innovative departure from standard methods. Moreover, if many participants build on similar baselines, the overall diversity of approaches considered may be limited. In this work, we propose a new class of IR evaluation metrics intended to promote greater diversity of approaches in evaluation campaigns. Whereas traditional IR metrics focus on user experience, our two"innovation"metrics instead reward exploration of more divergent, higher-risk strategies finding relevant documents missed by other systems. Experiments on four TREC collections show that our metrics do change system rankings by rewarding systems that find such rare, relevant documents. This result is further supported by a controlled, synthetic data experiment, and a qualitative analysis. In addition, we show that our metrics achieve higher evaluation stability and discriminative power than the standard metrics we modify. To support reproducibility, we share our source code.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Experiments in News Bias Detection with Pre-trained Neural Transformers Is Interpretable Machine Learning Effective at Feature Selection for Neural Learning-to-Rank? Two-Step SPLADE: Simple, Efficient and Effective Approximation of SPLADE Exploring the Nexus Between Retrievability and Query Generation Strategies Countering Mainstream Bias via End-to-End Adaptive Local Learning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1