Patch correctness assessment in automated program repair based on the impact of patches on production and test code

Ali Ghanbari, Andrian Marcus
{"title":"Patch correctness assessment in automated program repair based on the impact of patches on production and test code","authors":"Ali Ghanbari, Andrian Marcus","doi":"10.1145/3533767.3534368","DOIUrl":null,"url":null,"abstract":"Test-based generate-and-validate automated program repair (APR) systems often generate many patches that pass the test suite without fixing the bug. The generated patches must be manually inspected by the developers, so previous research proposed various techniques for automatic correctness assessment of APR-generated patches. Among them, dynamic patch correctness assessment techniques rely on the assumption that, when running the originally passing test cases, the correct patches will not alter the program behavior in a significant way, e.g., removing the code implementing correct functionality of the program. In this paper, we propose and evaluate a novel technique, named Shibboleth, for automatic correctness assessment of the patches generated by test-based generate-and-validate APR systems. Unlike existing works, the impact of the patches is captured along three complementary facets, allowing more effective patch correctness assessment. Specifically, we measure the impact of patches on both production code (via syntactic and semantic similarity) and test code (via code coverage of passing tests) to separate the patches that result in similar programs and that do not delete desired program elements. Shibboleth assesses the correctness of patches via both ranking and classification. We evaluated Shibboleth on 1,871 patches, generated by 29 Java-based APR systems for Defects4J programs. The technique outperforms state-of-the-art ranking and classification techniques. Specifically, in our ranking data set, in 43% (66%) of the cases, Shibboleth ranks the correct patch in top-1 (top-2) positions, and in classification mode applied on our classification data set, it achieves an accuracy and F1-score of 0.887 and 0.852, respectively.","PeriodicalId":412271,"journal":{"name":"Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3533767.3534368","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Test-based generate-and-validate automated program repair (APR) systems often generate many patches that pass the test suite without fixing the bug. The generated patches must be manually inspected by the developers, so previous research proposed various techniques for automatic correctness assessment of APR-generated patches. Among them, dynamic patch correctness assessment techniques rely on the assumption that, when running the originally passing test cases, the correct patches will not alter the program behavior in a significant way, e.g., removing the code implementing correct functionality of the program. In this paper, we propose and evaluate a novel technique, named Shibboleth, for automatic correctness assessment of the patches generated by test-based generate-and-validate APR systems. Unlike existing works, the impact of the patches is captured along three complementary facets, allowing more effective patch correctness assessment. Specifically, we measure the impact of patches on both production code (via syntactic and semantic similarity) and test code (via code coverage of passing tests) to separate the patches that result in similar programs and that do not delete desired program elements. Shibboleth assesses the correctness of patches via both ranking and classification. We evaluated Shibboleth on 1,871 patches, generated by 29 Java-based APR systems for Defects4J programs. The technique outperforms state-of-the-art ranking and classification techniques. Specifically, in our ranking data set, in 43% (66%) of the cases, Shibboleth ranks the correct patch in top-1 (top-2) positions, and in classification mode applied on our classification data set, it achieves an accuracy and F1-score of 0.887 and 0.852, respectively.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于补丁对生产和测试代码的影响,自动程序修复中的补丁正确性评估
基于测试的生成并验证自动程序修复(APR)系统通常生成许多补丁,这些补丁通过了测试套件,但没有修复错误。生成的补丁必须由开发人员手工检查,因此以前的研究提出了各种技术来自动评估apr生成的补丁的正确性。其中,动态补丁正确性评估技术依赖于这样一个假设,即在运行最初通过的测试用例时,正确的补丁不会显著改变程序行为,例如,删除实现程序正确功能的代码。在本文中,我们提出并评估了一种名为Shibboleth的新技术,用于对基于测试的生成和验证APR系统生成的补丁进行自动正确性评估。与现有的工作不同,补丁的影响是沿着三个互补的方面捕获的,允许更有效的补丁正确性评估。具体来说,我们测量补丁对生产代码(通过语法和语义相似性)和测试代码(通过通过测试的代码覆盖率)的影响,以分离导致类似程序和不删除所需程序元素的补丁。Shibboleth通过排序和分类来评估补丁的正确性。我们在1871个补丁上对Shibboleth进行了评估,这些补丁由29个基于java的APR系统为缺陷4j程序生成。该技术优于最先进的排名和分类技术。具体来说,在我们的排序数据集中,在43%(66%)的案例中,Shibboleth将正确patch排在top-1 (top-2)的位置,在我们的分类数据集中应用的分类模式下,它的准确率和f1得分分别达到了0.887和0.852。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
One step further: evaluating interpreters using metamorphic testing Faster mutation analysis with MeMu Test mimicry to assess the exploitability of library vulnerabilities A large-scale study of usability criteria addressed by static analysis tools NCScope: hardware-assisted analyzer for native code in Android apps
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1