Bias-aware ranking from pairwise comparisons

IF 2.8 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Data Mining and Knowledge Discovery Pub Date : 2024-05-31 DOI:10.1007/s10618-024-01024-z

Antonio Ferrara, Francesco Bonchi, Francesco Fabbri, Fariba Karimi, Claudia Wagner

{"title":"Bias-aware ranking from pairwise comparisons","authors":"Antonio Ferrara, Francesco Bonchi, Francesco Fabbri, Fariba Karimi, Claudia Wagner","doi":"10.1007/s10618-024-01024-z","DOIUrl":null,"url":null,"abstract":"<p>Human feedback is often used, either directly or indirectly, as input to algorithmic decision making. However, humans are biased: if the algorithm that takes as input the human feedback does not control for potential biases, this might result in biased algorithmic decision making, which can have a tangible impact on people’s lives. In this paper, we study how to detect and correct for evaluators’ bias in the task of <i>ranking people (or items) from pairwise comparisons</i>. Specifically, we assume we are given pairwise comparisons of the items to be ranked produced by a set of evaluators. While the pairwise assessments of the evaluators should reflect to a certain extent the latent (unobservable) true quality scores of the items, they might be affected by each evaluator’s own bias against, or in favor, of some groups of items. By detecting and amending evaluators’ biases, we aim to produce a ranking of the items that is, as much as possible, in accordance with the ranking one would produce by having access to the latent quality scores. Our proposal is a novel method that extends the classic Bradley-Terry model by having a bias parameter for each evaluator which distorts the true quality score of each item, depending on the group the item belongs to. Thanks to the simplicity of the model, we are able to write explicitly its log-likelihood w.r.t. the parameters (i.e., items’ latent scores and evaluators’ bias) and optimize by means of the alternating approach. Our experiments on synthetic and real-world data confirm that our method is able to reconstruct the bias of each single evaluator extremely well and thus to outperform several non-trivial competitors in the task of producing a ranking which is as much as possible close to the unbiased ranking.</p>","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"5 1","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data Mining and Knowledge Discovery","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10618-024-01024-z","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Human feedback is often used, either directly or indirectly, as input to algorithmic decision making. However, humans are biased: if the algorithm that takes as input the human feedback does not control for potential biases, this might result in biased algorithmic decision making, which can have a tangible impact on people’s lives. In this paper, we study how to detect and correct for evaluators’ bias in the task of ranking people (or items) from pairwise comparisons. Specifically, we assume we are given pairwise comparisons of the items to be ranked produced by a set of evaluators. While the pairwise assessments of the evaluators should reflect to a certain extent the latent (unobservable) true quality scores of the items, they might be affected by each evaluator’s own bias against, or in favor, of some groups of items. By detecting and amending evaluators’ biases, we aim to produce a ranking of the items that is, as much as possible, in accordance with the ranking one would produce by having access to the latent quality scores. Our proposal is a novel method that extends the classic Bradley-Terry model by having a bias parameter for each evaluator which distorts the true quality score of each item, depending on the group the item belongs to. Thanks to the simplicity of the model, we are able to write explicitly its log-likelihood w.r.t. the parameters (i.e., items’ latent scores and evaluators’ bias) and optimize by means of the alternating approach. Our experiments on synthetic and real-world data confirm that our method is able to reconstruct the bias of each single evaluator extremely well and thus to outperform several non-trivial competitors in the task of producing a ranking which is as much as possible close to the unbiased ranking.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过成对比较进行有偏差的排序

人类的反馈常常被直接或间接地用作算法决策的输入。然而，人类是有偏差的：如果将人类反馈作为输入的算法不能控制潜在的偏差，就可能导致算法决策的偏差，从而对人们的生活产生切实的影响。在本文中，我们将研究如何在通过成对比较对人（或物品）进行排序的任务中发现并纠正评价者的偏差。具体来说，我们假设有一组评价者对需要排名的项目进行成对比较。虽然评价者的成对评价应在一定程度上反映项目的潜在（不可观察的）真实质量分数，但它们可能会受到每个评价者自身对某些项目组的偏见或偏好的影响。通过检测和修正评估者的偏见，我们的目标是尽可能得出与获得潜在质量分数后得出的排序一致的项目排序。我们的建议是一种新颖的方法，它扩展了经典的布拉德利-特里模型，为每个评估者设置了一个偏差参数，该参数会根据项目所属的组别，扭曲每个项目的真实质量得分。得益于模型的简洁性，我们能够明确写出参数（即项目的潜在得分和评价者的偏差）的对数似然，并通过交替法进行优化。我们在合成数据和实际数据上进行的实验证实，我们的方法能够很好地重建每个评价者的偏差，因此在生成尽可能接近无偏差排名的任务中，我们的方法优于其他几个竞争对手。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Data Mining and Knowledge Discovery 工程技术-计算机：人工智能

CiteScore

10.40

自引率

4.20%

发文量

审稿时长

10 months

期刊介绍： Advances in data gathering, storage, and distribution have created a need for computational tools and techniques to aid in data analysis. Data Mining and Knowledge Discovery in Databases (KDD) is a rapidly growing area of research and application that builds on techniques and theories from many fields, including statistics, databases, pattern recognition and learning, data visualization, uncertainty modelling, data warehousing and OLAP, optimization, and high performance computing.