Impact of Defect Instances for Successful Deep Learning-based Automatic Program Repair

2022 IEEE International Conference on Software Maintenance and Evolution (ICSME) Pub Date : 2022-10-01 DOI:10.1109/ICSME55016.2022.00051

Misoo Kim, Youngkyoung Kim, Jinseok Heo, Hohyeon Jeong, Sungoh Kim, Eunseok Lee

{"title":"Impact of Defect Instances for Successful Deep Learning-based Automatic Program Repair","authors":"Misoo Kim, Youngkyoung Kim, Jinseok Heo, Hohyeon Jeong, Sungoh Kim, Eunseok Lee","doi":"10.1109/ICSME55016.2022.00051","DOIUrl":null,"url":null,"abstract":"Deep learning-based automatic program repair (DL-APR) returns a patch code when given a defect code. Recent studies on DL-APR techniques have focused on the training phase to generate more accurate patches; however, a trained model cannot always generate an accurate patch for every new defect code, as the training dataset does not completely represent the new defects to be input in the future. DL-APR researchers should study a method to elicit the best performance on new inputs from the trained and deployed model. A new defect instance (i.e., defect codes and their context codes) is one of the crucial input data that determine the accuracy of the DL-APR, which can be changed and improved. We improve the quality of new input defect instances by focusing on the presence of noise tokens which compromise the defect instances’ quality, thus impairing the accuracy of generated patches. This paper shows that 1) there are noise tokens which prevent correct patch generation (inference) in a new defect instance, and 2) it is necessary to mask these noise tokens to avoid their usage in inferencing patch codes. In order to validate these two assertions, we use a state-of-the-art DL-APR technique and a genetic algorithm to generate near-optimal defect instances which maximize the patch generation accuracy (i.e., the BLEU score) of 4,573 defect instances. Based on optimization results, we found that 1) noise tokens impair patch generation accuracy in approximately 49% of instances, and 2) if these tokens are precluded from inference by masking them, we can improve patch generation accuracy by 88%. The results suggest that future work is required to automatically remove noise tokens from new defect instances so that the trained patch generator generates better patches.","PeriodicalId":300084,"journal":{"name":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSME55016.2022.00051","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Deep learning-based automatic program repair (DL-APR) returns a patch code when given a defect code. Recent studies on DL-APR techniques have focused on the training phase to generate more accurate patches; however, a trained model cannot always generate an accurate patch for every new defect code, as the training dataset does not completely represent the new defects to be input in the future. DL-APR researchers should study a method to elicit the best performance on new inputs from the trained and deployed model. A new defect instance (i.e., defect codes and their context codes) is one of the crucial input data that determine the accuracy of the DL-APR, which can be changed and improved. We improve the quality of new input defect instances by focusing on the presence of noise tokens which compromise the defect instances’ quality, thus impairing the accuracy of generated patches. This paper shows that 1) there are noise tokens which prevent correct patch generation (inference) in a new defect instance, and 2) it is necessary to mask these noise tokens to avoid their usage in inferencing patch codes. In order to validate these two assertions, we use a state-of-the-art DL-APR technique and a genetic algorithm to generate near-optimal defect instances which maximize the patch generation accuracy (i.e., the BLEU score) of 4,573 defect instances. Based on optimization results, we found that 1) noise tokens impair patch generation accuracy in approximately 49% of instances, and 2) if these tokens are precluded from inference by masking them, we can improve patch generation accuracy by 88%. The results suggest that future work is required to automatically remove noise tokens from new defect instances so that the trained patch generator generates better patches.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

缺陷实例对成功基于深度学习的自动程序修复的影响

当给定缺陷代码时，基于深度学习的自动程序修复(DL-APR)返回补丁代码。最近对DL-APR技术的研究主要集中在训练阶段，以产生更准确的贴片;然而，一个训练好的模型不能总是为每一个新的缺陷代码生成一个准确的补丁，因为训练数据集不能完全代表未来要输入的新缺陷。DL-APR研究人员应该研究一种方法，从训练和部署的模型中获得新输入的最佳性能。一个新的缺陷实例(例如，缺陷代码和它们的上下文代码)是决定DL-APR准确性的关键输入数据之一，它可以被改变和改进。我们通过关注影响缺陷实例质量的噪声标记的存在来提高新输入缺陷实例的质量，从而损害生成补丁的准确性。本文表明:1)在新的缺陷实例中存在妨碍正确补丁生成(推理)的噪声标记;2)有必要屏蔽这些噪声标记以避免它们在补丁代码推理中使用。为了验证这两个断言，我们使用最先进的DL-APR技术和遗传算法来生成接近最优的缺陷实例，最大限度地提高了4,573个缺陷实例的补丁生成精度(即BLEU分数)。基于优化结果，我们发现1)噪声标记在大约49%的实例中损害补丁生成精度，2)如果通过屏蔽这些标记来排除这些标记，我们可以将补丁生成精度提高88%。结果表明，未来的工作需要从新的缺陷实例中自动去除噪声标记，以便训练好的补丁生成器生成更好的补丁。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)

自引率

0.00%

发文量

期刊最新文献

RestTestGen: An Extensible Framework for Automated Black-box Testing of RESTful APIs COBREX: A Tool for Extracting Business Rules from COBOL On the Security of Python Virtual Machines: An Empirical Study The Phantom Menace: Unmasking Security Issues in Evolving Software Impact of Defect Instances for Successful Deep Learning-based Automatic Program Repair