代码审稿人推荐系统中系统标签偏差的检测与消除

K. A. Tecimer, Eray Tüzün, Hamdi Dibeklioğlu, H. Erdogmus
{"title":"代码审稿人推荐系统中系统标签偏差的检测与消除","authors":"K. A. Tecimer, Eray Tüzün, Hamdi Dibeklioğlu, H. Erdogmus","doi":"10.1145/3463274.3463336","DOIUrl":null,"url":null,"abstract":"Reviewer selection in modern code review is crucial for effective code reviews. Several techniques exist for recommending reviewers appropriate for a given pull request (PR). Most code reviewer recommendation techniques in the literature build and evaluate their models based on datasets collected from real projects using open-source or industrial practices. The techniques invariably presume that these datasets reliably represent the “ground truth.” In the context of a classification problem, ground truth refers to the objectively correct labels of a class used to build models from a dataset or evaluate a model’s performance. In a project dataset used to build a code reviewer recommendation system, the recommended code reviewer picked for a PR is usually assumed to be the best code reviewer for that PR. However, in practice, the recommended code reviewer may not be the best possible code reviewer, or even a qualified one. Recent code reviewer recommendation studies suggest that the datasets used tend to suffer from systematic labeling bias, making the ground truth unreliable. Therefore, models and recommendation systems built on such datasets may perform poorly in real practice. In this study, we introduce a novel approach to automatically detect and eliminate systematic labeling bias in code reviewer recommendation systems. The bias that we remove results from selecting reviewers that do not ensure a permanently successful fix for a bug-related PR. To demonstrate the effectiveness of our approach, we evaluated it on two open-source project datasets —HIVE and QT Creator— and with five code reviewer recommendation techniques —Profile-Based, RSTrace, Naive Bayes, k-NN, and Decision Tree. Our debiasing approach appears promising since it improved the Mean Reciprocal Rank (MRR) of the evaluated techniques up to 26% in the datasets used.","PeriodicalId":328024,"journal":{"name":"Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Detection and Elimination of Systematic Labeling Bias in Code Reviewer Recommendation Systems\",\"authors\":\"K. A. Tecimer, Eray Tüzün, Hamdi Dibeklioğlu, H. Erdogmus\",\"doi\":\"10.1145/3463274.3463336\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reviewer selection in modern code review is crucial for effective code reviews. Several techniques exist for recommending reviewers appropriate for a given pull request (PR). Most code reviewer recommendation techniques in the literature build and evaluate their models based on datasets collected from real projects using open-source or industrial practices. The techniques invariably presume that these datasets reliably represent the “ground truth.” In the context of a classification problem, ground truth refers to the objectively correct labels of a class used to build models from a dataset or evaluate a model’s performance. In a project dataset used to build a code reviewer recommendation system, the recommended code reviewer picked for a PR is usually assumed to be the best code reviewer for that PR. However, in practice, the recommended code reviewer may not be the best possible code reviewer, or even a qualified one. Recent code reviewer recommendation studies suggest that the datasets used tend to suffer from systematic labeling bias, making the ground truth unreliable. Therefore, models and recommendation systems built on such datasets may perform poorly in real practice. In this study, we introduce a novel approach to automatically detect and eliminate systematic labeling bias in code reviewer recommendation systems. The bias that we remove results from selecting reviewers that do not ensure a permanently successful fix for a bug-related PR. To demonstrate the effectiveness of our approach, we evaluated it on two open-source project datasets —HIVE and QT Creator— and with five code reviewer recommendation techniques —Profile-Based, RSTrace, Naive Bayes, k-NN, and Decision Tree. Our debiasing approach appears promising since it improved the Mean Reciprocal Rank (MRR) of the evaluated techniques up to 26% in the datasets used.\",\"PeriodicalId\":328024,\"journal\":{\"name\":\"Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3463274.3463336\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3463274.3463336","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

摘要

现代代码审查中的审稿人选择对有效的代码审查至关重要。有几种技术可以为给定的pull request (PR)推荐合适的审阅者。文献中的大多数代码审查推荐技术都是基于使用开源或工业实践从真实项目中收集的数据集来构建和评估模型的。这些技术总是假定这些数据集可靠地代表了“基本事实”。在分类问题的上下文中,基础真值是指用于从数据集构建模型或评估模型性能的类的客观正确标签。在用于构建代码审查员推荐系统的项目数据集中,为PR选择的被推荐的代码审查员通常被认为是该PR的最佳代码审查员。然而,在实践中,被推荐的代码审查员可能不是最好的代码审查员,甚至不是合格的代码审查员。最近的代码审稿人推荐研究表明,所使用的数据集往往存在系统性的标签偏见,使基本事实不可靠。因此,建立在这些数据集上的模型和推荐系统在实际应用中可能表现不佳。在这项研究中,我们引入了一种新的方法来自动检测和消除代码审稿人推荐系统中的系统标签偏差。我们从选择评审者中删除不能确保永久成功修复bug相关PR的结果的偏见。为了证明我们方法的有效性,我们在两个开源项目数据集(hive和QT Creator)上对其进行了评估,并使用了五种代码评审者推荐技术(基于概要文件、RSTrace、朴素贝叶斯、k-NN和决策树)。我们的去偏方法看起来很有希望,因为它在使用的数据集中将评估技术的平均倒数秩(MRR)提高了26%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Detection and Elimination of Systematic Labeling Bias in Code Reviewer Recommendation Systems
Reviewer selection in modern code review is crucial for effective code reviews. Several techniques exist for recommending reviewers appropriate for a given pull request (PR). Most code reviewer recommendation techniques in the literature build and evaluate their models based on datasets collected from real projects using open-source or industrial practices. The techniques invariably presume that these datasets reliably represent the “ground truth.” In the context of a classification problem, ground truth refers to the objectively correct labels of a class used to build models from a dataset or evaluate a model’s performance. In a project dataset used to build a code reviewer recommendation system, the recommended code reviewer picked for a PR is usually assumed to be the best code reviewer for that PR. However, in practice, the recommended code reviewer may not be the best possible code reviewer, or even a qualified one. Recent code reviewer recommendation studies suggest that the datasets used tend to suffer from systematic labeling bias, making the ground truth unreliable. Therefore, models and recommendation systems built on such datasets may perform poorly in real practice. In this study, we introduce a novel approach to automatically detect and eliminate systematic labeling bias in code reviewer recommendation systems. The bias that we remove results from selecting reviewers that do not ensure a permanently successful fix for a bug-related PR. To demonstrate the effectiveness of our approach, we evaluated it on two open-source project datasets —HIVE and QT Creator— and with five code reviewer recommendation techniques —Profile-Based, RSTrace, Naive Bayes, k-NN, and Decision Tree. Our debiasing approach appears promising since it improved the Mean Reciprocal Rank (MRR) of the evaluated techniques up to 26% in the datasets used.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
About the Assessment of Grey Literature in Software Engineering Towards an Automated Classification Approach for Software Engineering Research Fog Based Energy Efficient Process Framework for Smart Building Open Data-driven Usability Improvements of Static Code Analysis and its Challenges Towards a corpus for credibility assessment in software practitioner blog articles
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1