Principled Multi-Aspect Evaluation Measures of Rankings

Proceedings of the 30th ACM International Conference on Information & Knowledge Management Pub Date : 2021-10-26 DOI:10.1145/3459637.3482287

Maria Maistro, Lucas Chaves Lima, J. Simonsen, C. Lioma

{"title":"Principled Multi-Aspect Evaluation Measures of Rankings","authors":"Maria Maistro, Lucas Chaves Lima, J. Simonsen, C. Lioma","doi":"10.1145/3459637.3482287","DOIUrl":null,"url":null,"abstract":"Information Retrieval evaluation has traditionally focused on defining principled ways of assessing the relevance of a ranked list of documents with respect to a query. Several methods extend this type of evaluation beyond relevance, making it possible to evaluate different aspects of a document ranking (e.g., relevance, usefulness, or credibility) using a single measure (multi-aspect evaluation). However, these methods either are (i) tailor-made for specific aspects and do not extend to other types or numbers of aspects, or (ii) have theoretical anomalies, e.g. assign maximum score to a ranking where all documents are labelled with the lowest grade with respect to all aspects (e.g., not relevant, not credible, etc.). We present a theoretically principled multi-aspect evaluation method that can be used for any number, and any type, of aspects. A thorough empirical evaluation using up to 5 aspects and a total of 425 runs officially submitted to 10 TREC tracks shows that our method is more discriminative than the state-of-the-art and overcomes theoretical limitations of the state-of-the-art.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3459637.3482287","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Information Retrieval evaluation has traditionally focused on defining principled ways of assessing the relevance of a ranked list of documents with respect to a query. Several methods extend this type of evaluation beyond relevance, making it possible to evaluate different aspects of a document ranking (e.g., relevance, usefulness, or credibility) using a single measure (multi-aspect evaluation). However, these methods either are (i) tailor-made for specific aspects and do not extend to other types or numbers of aspects, or (ii) have theoretical anomalies, e.g. assign maximum score to a ranking where all documents are labelled with the lowest grade with respect to all aspects (e.g., not relevant, not credible, etc.). We present a theoretically principled multi-aspect evaluation method that can be used for any number, and any type, of aspects. A thorough empirical evaluation using up to 5 aspects and a total of 425 runs officially submitted to 10 TREC tracks shows that our method is more discriminative than the state-of-the-art and overcomes theoretical limitations of the state-of-the-art.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

排名的原则性多方面评价方法

信息检索评估传统上侧重于定义评估与查询相关的文档排序列表的相关性的原则方法。有几种方法将这种类型的评估扩展到相关性之外，使得使用单一度量(多方面评估)来评估文档排名的不同方面(例如，相关性、有用性或可信度)成为可能。然而，这些方法要么是(i)为特定方面量身定制的，不能扩展到其他类型或数量的方面，要么(ii)有理论上的异常，例如，对所有文件在所有方面(例如，不相关，不可信等)被标记为最低等级的排名赋予最高分数。我们提出了一个理论上有原则的多方面评价方法，可以用于任何数量和任何类型的方面。一项全面的实证评估使用多达5个方面和425个运行正式提交给10个TREC轨道表明，我们的方法比最先进的技术更具判别性，并克服了最先进的理论局限性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 30th ACM International Conference on Information & Knowledge Management

自引率

0.00%

发文量

期刊最新文献

UltraGCN Fine and Coarse Granular Argument Classification before Clustering CHASE Crawler Detection in Location-Based Services Using Attributed Action Net Failure Prediction for Large-scale Water Pipe Networks Using GNN and Temporal Failure Series