首页 > 最新文献

2022 IEEE/ACM International Workshop on Automated Program Repair (APR)最新文献

英文 中文
Scaling Genetic Improvement and Automated Program Repair 缩放遗传改进和自动程序修复
Pub Date : 2022-05-01 DOI: 10.1145/3524459.3527353
M. Harman
This paper outlines techniques and research directions for scaling genetic improvement and automated program repair, highlighting possible directions for future work and open challenges.
本文概述了基因改良和自动程序修复的技术和研究方向,强调了未来工作的可能方向和开放的挑战。
{"title":"Scaling Genetic Improvement and Automated Program Repair","authors":"M. Harman","doi":"10.1145/3524459.3527353","DOIUrl":"https://doi.org/10.1145/3524459.3527353","url":null,"abstract":"This paper outlines techniques and research directions for scaling genetic improvement and automated program repair, highlighting possible directions for future work and open challenges.","PeriodicalId":131481,"journal":{"name":"2022 IEEE/ACM International Workshop on Automated Program Repair (APR)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124958848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Framing Program Repair as Code Completion 框架程序修复作为代码完成
Pub Date : 2022-05-01 DOI: 10.1145/3524459.3527347
Francisco Ribeiro, Rui Abreu, João Saraiva
Many techniques have contributed to the advancement of auto-mated program repair, such as: generate and validate approaches, constraint-based solvers and even neural machine translation. Si-multaneously, artificial intelligence has allowed the creation of general-purpose pre-trained models that support several down-stream tasks. In this paper, we describe a technique that takes advantage of a generative model - CodeGPT - to automatically repair buggy programs by making use of its code completion capa-bilities. We also elaborate on where to perform code completion in a buggy line and how we circumvent the open-ended nature of code generation to appropriately fit the new code in the original pro-gram. Furthermore, we validate our approach on the ManySStuBs4J dataset containing real-world open-source projects and show that our tool is able to fix 1739 programs out of 6415 - a 27% repair rate. The repaired programs range from single-line changes to multiple line modifications. In fact, our technique is able to fix programs which were missing relatively complex expressions prior to being analyzed. In the end, we present case studies that showcase different scenarios our technique was able to handle.
许多技术为自动程序修复的进步做出了贡献,例如:生成和验证方法,基于约束的求解器,甚至神经机器翻译。同时,人工智能允许创建支持多个下游任务的通用预训练模型。在本文中,我们描述了一种利用生成模型- CodeGPT -通过利用其代码完成功能来自动修复错误程序的技术。我们还详细说明了在有bug的行中执行代码补全的位置,以及如何规避代码生成的开放式特性,以适当地适应原始程序中的新代码。此外,我们在包含真实开源项目的ManySStuBs4J数据集上验证了我们的方法,并显示我们的工具能够修复6415个程序中的1739个——修复率为27%。修复的程序范围从单行更改到多行修改。事实上,我们的技术能够修复那些在分析之前缺少相对复杂表达式的程序。最后,我们提供了案例研究,展示了我们的技术能够处理的不同场景。
{"title":"Framing Program Repair as Code Completion","authors":"Francisco Ribeiro, Rui Abreu, João Saraiva","doi":"10.1145/3524459.3527347","DOIUrl":"https://doi.org/10.1145/3524459.3527347","url":null,"abstract":"Many techniques have contributed to the advancement of auto-mated program repair, such as: generate and validate approaches, constraint-based solvers and even neural machine translation. Si-multaneously, artificial intelligence has allowed the creation of general-purpose pre-trained models that support several down-stream tasks. In this paper, we describe a technique that takes advantage of a generative model - CodeGPT - to automatically repair buggy programs by making use of its code completion capa-bilities. We also elaborate on where to perform code completion in a buggy line and how we circumvent the open-ended nature of code generation to appropriately fit the new code in the original pro-gram. Furthermore, we validate our approach on the ManySStuBs4J dataset containing real-world open-source projects and show that our tool is able to fix 1739 programs out of 6415 - a 27% repair rate. The repaired programs range from single-line changes to multiple line modifications. In fact, our technique is able to fix programs which were missing relatively complex expressions prior to being analyzed. In the end, we present case studies that showcase different scenarios our technique was able to handle.","PeriodicalId":131481,"journal":{"name":"2022 IEEE/ACM International Workshop on Automated Program Repair (APR)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131385736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Language Models Can Prioritize Patches for Practical Program Patching 语言模型可以优先考虑实际程序补丁的补丁
Pub Date : 2022-05-01 DOI: 10.1145/3524459.3527343
Sungmin Kang, S. Yoo
The field of Automated Program Repair (APR) has seen significant growth in the past decade. As the field progressed, the number of templates used by APR tools has grown substantially to increase the number of patches included within the domain each tool finds fixable, thus increasing their fixing capability. However, this height-ened potential was not free: new techniques paid by using greater computational resources and time to look over an enlarged repair space. In this paper, we look to curtail this trend by using language models (LMs) to provide guidance about whether a generated patch is natural. By prioritizing patches that generate natural code, which has been demonstrated in prior work to be related to correctness, we can reduce the number of patches that must be inspected to find the first correct patch. We evaluate this prioritization scheme over five APR tools, and find that we can reduce the number of patches that must be inspected in up to 70% of bugs and reduce the total number of patches inspected by up to two-thirds, paving the way for lower-cost program repair.
在过去的十年中,自动程序修复(APR)领域取得了显著的发展。随着该领域的发展,APR工具使用的模板数量大幅增长,从而增加了每个工具发现可修复的领域中包含的补丁数量,从而增加了它们的修复能力。然而,这种增强的潜力并不是免费的:新技术的代价是使用更多的计算资源和时间来查看扩大的修复空间。在本文中,我们希望通过使用语言模型(LMs)来提供关于生成的补丁是否自然的指导来遏制这种趋势。通过对生成自然代码的补丁进行优先级排序,这在之前的工作中已经被证明与正确性相关,我们可以减少必须检查以找到第一个正确补丁的补丁的数量。我们在五个APR工具上评估了这个优先级方案,发现我们可以减少高达70%的错误必须检查的补丁数量,并将检查的补丁总数减少多达三分之二,为低成本的程序修复铺平了道路。
{"title":"Language Models Can Prioritize Patches for Practical Program Patching","authors":"Sungmin Kang, S. Yoo","doi":"10.1145/3524459.3527343","DOIUrl":"https://doi.org/10.1145/3524459.3527343","url":null,"abstract":"The field of Automated Program Repair (APR) has seen significant growth in the past decade. As the field progressed, the number of templates used by APR tools has grown substantially to increase the number of patches included within the domain each tool finds fixable, thus increasing their fixing capability. However, this height-ened potential was not free: new techniques paid by using greater computational resources and time to look over an enlarged repair space. In this paper, we look to curtail this trend by using language models (LMs) to provide guidance about whether a generated patch is natural. By prioritizing patches that generate natural code, which has been demonstrated in prior work to be related to correctness, we can reduce the number of patches that must be inspected to find the first correct patch. We evaluate this prioritization scheme over five APR tools, and find that we can reduce the number of patches that must be inspected in up to 70% of bugs and reduce the total number of patches inspected by up to two-thirds, paving the way for lower-cost program repair.","PeriodicalId":131481,"journal":{"name":"2022 IEEE/ACM International Workshop on Automated Program Repair (APR)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114244260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
What Can Program Repair Learn From Code Review? 程序修复可以从代码审查中学到什么?
Pub Date : 2022-05-01 DOI: 10.1145/3524459.3527352
Madeline Endres, Pemma Reiter, S. Forrest, Westley Weimer
Over the past fifteen years, research on automated program repair has matured, and transitions to industry have begun. However, an impediment to wider adoption is concern over automatically gen-erated patch correctness. A review of 250 program repair research papers suggests that this concern can be addressed by adapting practices from modern code review, such as multiple anonymized reviews and checklists with well-defined terminology, to better evaluate the correctness and acceptability of plausible patches. In this paper, we argue that adopting such practices from modern code review for automated program repair research can increase developer trust, paving the way for wider industrial deployments.
在过去的15年中,对自动化程序修复的研究已经成熟,并且已经开始向工业过渡。然而,更广泛采用的一个障碍是对自动生成的补丁正确性的关注。对250个程序修复研究论文的回顾表明,这个问题可以通过适应现代代码审查的实践来解决,例如多个匿名的审查和具有良好定义术语的检查表,以更好地评估合理补丁的正确性和可接受性。在本文中,我们认为从现代代码审查中采用这种实践来进行自动程序修复研究可以增加开发人员的信任,为更广泛的工业部署铺平道路。
{"title":"What Can Program Repair Learn From Code Review?","authors":"Madeline Endres, Pemma Reiter, S. Forrest, Westley Weimer","doi":"10.1145/3524459.3527352","DOIUrl":"https://doi.org/10.1145/3524459.3527352","url":null,"abstract":"Over the past fifteen years, research on automated program repair has matured, and transitions to industry have begun. However, an impediment to wider adoption is concern over automatically gen-erated patch correctness. A review of 250 program repair research papers suggests that this concern can be addressed by adapting practices from modern code review, such as multiple anonymized reviews and checklists with well-defined terminology, to better evaluate the correctness and acceptability of plausible patches. In this paper, we argue that adopting such practices from modern code review for automated program repair research can increase developer trust, paving the way for wider industrial deployments.","PeriodicalId":131481,"journal":{"name":"2022 IEEE/ACM International Workshop on Automated Program Repair (APR)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134055616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Be Realistic: Automated Program Repair is a Combination of Undecidable Problems 现实一点:自动程序修复是一个无法确定问题的组合
Pub Date : 2022-05-01 DOI: 10.1145/3524459.3527346
Amirfarhad Nilizadeh, Gary T. Leavens
Automated program repair (APR) tools have promising results, but what are APR's limits? The answer could help researchers design tool trade-offs and manage user expectations. Since APR is undecidable, as are two of its typical phases, tools must use conservative approximations. Such approximations can help APR tools be better understood and can lead to a theory of sound APR.
自动化程序修复(APR)工具有很好的效果,但是APR的限制是什么?答案可以帮助研究人员设计工具的权衡和管理用户的期望。由于APR和它的两个典型阶段一样是不可确定的,所以工具必须使用保守的近似值。这种近似可以帮助更好地理解APR工具,并可以导致合理的APR理论。
{"title":"Be Realistic: Automated Program Repair is a Combination of Undecidable Problems","authors":"Amirfarhad Nilizadeh, Gary T. Leavens","doi":"10.1145/3524459.3527346","DOIUrl":"https://doi.org/10.1145/3524459.3527346","url":null,"abstract":"Automated program repair (APR) tools have promising results, but what are APR's limits? The answer could help researchers design tool trade-offs and manage user expectations. Since APR is undecidable, as are two of its typical phases, tools must use conservative approximations. Such approximations can help APR tools be better understood and can lead to a theory of sound APR.","PeriodicalId":131481,"journal":{"name":"2022 IEEE/ACM International Workshop on Automated Program Repair (APR)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133623441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Towards JavaScript program repair with Generative Pre-trained Transformer (GPT-2) 使用生成式预训练转换器(GPT-2)修复JavaScript程序
Pub Date : 2022-05-01 DOI: 10.1145/3524459.3527350
Márk Lajkó, Viktor Csuvik, László Vidács
The goal of Automated Program Repair (APR) is to find a fix to software bugs, without human intervention. The so-called Gener-ate and Validate (G&V) approach deemed to be the most popular method in the last few years, where the APR tool creates a patch and it is validated against an oracle. Recent years for Natural Language Processing (NLP) were of great interest, with new pre-trained models shattering records on tasks ranging from sentiment analysis to question answering. Usually these deep learning models inspire the APR community as well. These approaches usually require a large dataset on which the model can be trained (or fine-tuned) and evaluated. The criterion to accept a patch depends on the underlying dataset, but usually the generated patch should be exactly the same as the one created by a human developer. As NLP models are more and more capable to form sentences, and the sentences will form coherent paragraphs, the APR tools are also better and better at generating syntactically and semantically correct source code. As the Generative Pre-trained Transformer (GPT) model is now avail-able to everyone thanks to the NLP and AI research community, it can be fine-tuned to specific tasks (not necessarily on natural language). In this work we use the GPT-2 model to generate source code, to the best of our knowledge, the GPT-2 model was not used for Automated Program Repair so far. The model is fine-tuned for a specific task: it has been taught to fix JavaScript bugs automatically. To do so, we trained the model on 16863JS code snippets, where it could learn the nature of the observed programming language. In our experiments we observed that the GPT-2 model was able to learn how to write syntactically correct source code almost on every attempt, although it failed to learn good bug-fixes in some cases. Nonetheless it was able to generate the correct fixes in most of the cases, resulting in an overall accuracy up to 17.25%.
自动程序修复(APR)的目标是在没有人工干预的情况下找到软件错误的修复。所谓的生成和验证(G&V)方法被认为是最近几年最流行的方法,其中APR工具创建补丁并针对oracle进行验证。近年来,自然语言处理(NLP)引起了人们的极大兴趣,新的预训练模型打破了从情感分析到问答等任务的记录。通常这些深度学习模型也会启发APR社区。这些方法通常需要一个大的数据集,在这个数据集上模型可以被训练(或微调)和评估。接受补丁的标准取决于底层数据集,但通常生成的补丁应该与人类开发人员创建的补丁完全相同。随着NLP模型越来越有能力形成句子,句子将形成连贯的段落,APR工具在生成语法和语义正确的源代码方面也越来越好。由于NLP和人工智能研究社区的存在,生成预训练转换器(GPT)模型现在可供所有人使用,它可以针对特定任务进行微调(不一定是在自然语言上)。在这项工作中,我们使用GPT-2模型来生成源代码,据我们所知,到目前为止,GPT-2模型还没有用于自动程序修复。该模型针对特定任务进行了微调:它已经学会了自动修复JavaScript错误。为此,我们在16863JS代码片段上训练模型,使其能够学习所观察到的编程语言的性质。在我们的实验中,我们观察到GPT-2模型几乎在每次尝试中都能够学习如何编写语法正确的源代码,尽管在某些情况下它无法学习良好的错误修复。尽管如此,它能够在大多数情况下生成正确的修复,从而使总体准确率达到17.25%。
{"title":"Towards JavaScript program repair with Generative Pre-trained Transformer (GPT-2)","authors":"Márk Lajkó, Viktor Csuvik, László Vidács","doi":"10.1145/3524459.3527350","DOIUrl":"https://doi.org/10.1145/3524459.3527350","url":null,"abstract":"The goal of Automated Program Repair (APR) is to find a fix to software bugs, without human intervention. The so-called Gener-ate and Validate (G&V) approach deemed to be the most popular method in the last few years, where the APR tool creates a patch and it is validated against an oracle. Recent years for Natural Language Processing (NLP) were of great interest, with new pre-trained models shattering records on tasks ranging from sentiment analysis to question answering. Usually these deep learning models inspire the APR community as well. These approaches usually require a large dataset on which the model can be trained (or fine-tuned) and evaluated. The criterion to accept a patch depends on the underlying dataset, but usually the generated patch should be exactly the same as the one created by a human developer. As NLP models are more and more capable to form sentences, and the sentences will form coherent paragraphs, the APR tools are also better and better at generating syntactically and semantically correct source code. As the Generative Pre-trained Transformer (GPT) model is now avail-able to everyone thanks to the NLP and AI research community, it can be fine-tuned to specific tasks (not necessarily on natural language). In this work we use the GPT-2 model to generate source code, to the best of our knowledge, the GPT-2 model was not used for Automated Program Repair so far. The model is fine-tuned for a specific task: it has been taught to fix JavaScript bugs automatically. To do so, we trained the model on 16863JS code snippets, where it could learn the nature of the observed programming language. In our experiments we observed that the GPT-2 model was able to learn how to write syntactically correct source code almost on every attempt, although it failed to learn good bug-fixes in some cases. Nonetheless it was able to generate the correct fixes in most of the cases, resulting in an overall accuracy up to 17.25%.","PeriodicalId":131481,"journal":{"name":"2022 IEEE/ACM International Workshop on Automated Program Repair (APR)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114942182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Can OpenAI's Codex Fix Bugs?: An evaluation on QuixBugs OpenAI的Codex能修复bug吗?:对QuixBugs的评价
Pub Date : 2022-05-01 DOI: 10.1145/3524459.3527351
Julian Aron Prenner, Hlib Babii, R. Robbes
OpenAI's Codex, a GPT-3like model trained on a large code corpus, has made headlines in and outside of academia. Given a short user-provided description, it is capable of synthesizing code snippets that are syntactically and semantically valid in most cases. In this work, we want to investigate whether Codex is able to localize and fix bugs, two important tasks in automated program repair. Our initial evaluation uses the multi-language QuixBugs benchmark (40 bugs in both Python and Java). We find that, despite not being trained for APR, Codex is surprisingly effective, and competitive with recent state of the art techniques. Our results also show that Codex is more successful at repairing Python than Java, fixing 50% more bugs in Python.
OpenAI的Codex是一个在大型代码语料库上训练的类似gpt -3的模型,已经成为学术界内外的头条新闻。给定用户提供的简短描述,它能够合成在大多数情况下在语法和语义上都有效的代码片段。在这项工作中,我们想调查Codex是否能够定位和修复错误,这是自动程序修复中的两项重要任务。我们最初的评估使用了多语言的QuixBugs基准测试(Python和Java中都有40个bug)。我们发现,尽管没有接受过APR培训,但食品法典的有效性令人惊讶,与最新的技术相比具有竞争力。我们的结果还表明,Codex在修复Python方面比Java更成功,修复的Python错误多50%。
{"title":"Can OpenAI's Codex Fix Bugs?: An evaluation on QuixBugs","authors":"Julian Aron Prenner, Hlib Babii, R. Robbes","doi":"10.1145/3524459.3527351","DOIUrl":"https://doi.org/10.1145/3524459.3527351","url":null,"abstract":"OpenAI's Codex, a GPT-3like model trained on a large code corpus, has made headlines in and outside of academia. Given a short user-provided description, it is capable of synthesizing code snippets that are syntactically and semantically valid in most cases. In this work, we want to investigate whether Codex is able to localize and fix bugs, two important tasks in automated program repair. Our initial evaluation uses the multi-language QuixBugs benchmark (40 bugs in both Python and Java). We find that, despite not being trained for APR, Codex is surprisingly effective, and competitive with recent state of the art techniques. Our results also show that Codex is more successful at repairing Python than Java, fixing 50% more bugs in Python.","PeriodicalId":131481,"journal":{"name":"2022 IEEE/ACM International Workshop on Automated Program Repair (APR)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127372629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 51
Enhancing Spectrum Based Fault localization Via Emphasizing Its Formulas With Importance Weight 利用重要权值强化基于谱的故障定位方法
Pub Date : 2022-05-01 DOI: 10.1145/3524459.3527349
Q. Sarhan
Spectrum-Based Fault Localization (SBFL) computes suspicion scores, using risk evaluation formulas, for program elements (e.g., state-ments, methods, or classes) by counting how often each element is executed or not executed by passing versus failing test cases. The elements are then ranked from most suspicious to least suspicious based on their scores. The elements with the highest scores are thought to be the most faulty. The final ranking list of program elements helps testers during the debugging process when attempting to locate the source of a bug in the program under test. In this paper, we present an approach that gives more importance to pro-gram elements that are executed by more failed test cases compared to other elements. In essence, we are emphasizing the failing test cases factor because there are comparably much less failing tests than passing ones. We multiply each element's suspicion score ob-tained by an SBFL formula by this importance weight, which is the ratio of covering failing tests over all failing tests. The proposed approach can be applied to SBFL formulas without modifying their structures. The experimental results of our study show that our approach achieved a better performance in terms of average ranking compared to the underlying SBFL formulas. It also improved the Top- N categories and increased the number of cases in which the faulty method became the top-ranked element.
基于谱的故障定位(Spectrum-Based Fault Localization, SBFL)使用风险评估公式,计算程序元素(例如,状态、方法或类)的怀疑分数,方法是通过通过测试用例与失败测试用例来计算每个元素执行或不执行的频率。然后根据这些元素的得分,将它们从最可疑的到最不可疑的进行排序。得分最高的元素被认为是缺陷最多的。程序元素的最终排序列表可以帮助测试人员在调试过程中定位被测程序中的错误来源。在本文中,我们提出了一种方法,该方法对程序元素给予了更多的重视,与其他元素相比,程序元素被更多失败的测试用例执行。从本质上讲,我们强调失败的测试用例因素,因为失败的测试比通过的测试要少得多。我们用SBFL公式得到的每个元素的怀疑分数乘以这个重要性权重,它是覆盖失败测试与所有失败测试的比率。所提出的方法可以在不改变SBFL公式结构的情况下应用于该公式。实验结果表明,与基础的SBFL公式相比,我们的方法在平均排名方面取得了更好的性能。它还改进了Top- N类别,并增加了错误方法成为排名最高元素的案例数量。
{"title":"Enhancing Spectrum Based Fault localization Via Emphasizing Its Formulas With Importance Weight","authors":"Q. Sarhan","doi":"10.1145/3524459.3527349","DOIUrl":"https://doi.org/10.1145/3524459.3527349","url":null,"abstract":"Spectrum-Based Fault Localization (SBFL) computes suspicion scores, using risk evaluation formulas, for program elements (e.g., state-ments, methods, or classes) by counting how often each element is executed or not executed by passing versus failing test cases. The elements are then ranked from most suspicious to least suspicious based on their scores. The elements with the highest scores are thought to be the most faulty. The final ranking list of program elements helps testers during the debugging process when attempting to locate the source of a bug in the program under test. In this paper, we present an approach that gives more importance to pro-gram elements that are executed by more failed test cases compared to other elements. In essence, we are emphasizing the failing test cases factor because there are comparably much less failing tests than passing ones. We multiply each element's suspicion score ob-tained by an SBFL formula by this importance weight, which is the ratio of covering failing tests over all failing tests. The proposed approach can be applied to SBFL formulas without modifying their structures. The experimental results of our study show that our approach achieved a better performance in terms of average ranking compared to the underlying SBFL formulas. It also improved the Top- N categories and increased the number of cases in which the faulty method became the top-ranked element.","PeriodicalId":131481,"journal":{"name":"2022 IEEE/ACM International Workshop on Automated Program Repair (APR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129509171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Some Automatically Generated Patches are More Likely to be Correct than Others: An Analysis of Defects4J Patch Features 一些自动生成的补丁比其他的更可能是正确的:对缺陷4j补丁特性的分析
Pub Date : 2022-05-01 DOI: 10.1145/3524459.3527348
G. Bennett, T. Hall, David Bowes
Defects4J is a popular dataset against which many Java Automatic Program Repair (APR) tools benchmark their performance. However, recent evidence suggests that some APR tools overfit to Defects4J, producing plausible patches which are incorrect. What we do not currently know is whether there is any commonality in the features of these plausible patches that turn out not to be correct. We compare the features of Defects4J's human written patches in terms of those correctly patched by existing APR tools and those incorrectly patched. We found that 48.4% of Defects4J v1.5 have been automatically patched by existing APR tools; of which only 28.9% have been correctly patched leaving 19.5% incorrectly patched. We found patches of defects that added a method call, added a variable, or wrapped existing code with new code, such as a try/catch block were significantly associated with incorrect patches. Editing only a single line was significantly associated with correct patches. Our results suggest that current tools are weak at generating multi-line patches and synthesising new code especially when wrapping existing code. Our results highlight potential future areas of development for new APR approaches, such as developing a tool that effectively repairs defects that require a try/catch block. Our replication Package is available online11Replication Package available at: https://github.com/IncorrectDefects/ReplicationPackage.
缺陷4j是一个流行的数据集,许多Java自动程序修复(APR)工具都以此为基准对其性能进行基准测试。然而,最近的证据表明,一些APR工具过于适合缺陷4j,产生了不正确的貌似合理的补丁。我们目前不知道的是,这些看似合理的补丁的特征中是否存在任何共性,而这些共性最终被证明是不正确的。我们比较了缺陷4j的人工编写补丁的特性,根据现有APR工具正确修补的特性和不正确修补的特性。我们发现48.4%的缺陷4j v1.5已经被现有的APR工具自动修补;其中只有28.9%的补丁是正确的,剩下19.5%的补丁是错误的。我们发现了缺陷的补丁,这些缺陷添加了一个方法调用,添加了一个变量,或者用新代码包装了现有的代码,比如一个try/catch块,这些缺陷明显与不正确的补丁相关联。只编辑单行与正确的补丁显著相关。我们的结果表明,当前的工具在生成多行补丁和合成新代码方面很弱,特别是在包装现有代码时。我们的结果突出了新的APR方法的潜在未来发展领域,例如开发一种工具,可以有效地修复需要尝试/捕获块的缺陷。我们的复制包可在网上获得。复制包可在:https://github.com/IncorrectDefects/ReplicationPackage。
{"title":"Some Automatically Generated Patches are More Likely to be Correct than Others: An Analysis of Defects4J Patch Features","authors":"G. Bennett, T. Hall, David Bowes","doi":"10.1145/3524459.3527348","DOIUrl":"https://doi.org/10.1145/3524459.3527348","url":null,"abstract":"Defects4J is a popular dataset against which many Java Automatic Program Repair (APR) tools benchmark their performance. However, recent evidence suggests that some APR tools overfit to Defects4J, producing plausible patches which are incorrect. What we do not currently know is whether there is any commonality in the features of these plausible patches that turn out not to be correct. We compare the features of Defects4J's human written patches in terms of those correctly patched by existing APR tools and those incorrectly patched. We found that 48.4% of Defects4J v1.5 have been automatically patched by existing APR tools; of which only 28.9% have been correctly patched leaving 19.5% incorrectly patched. We found patches of defects that added a method call, added a variable, or wrapped existing code with new code, such as a try/catch block were significantly associated with incorrect patches. Editing only a single line was significantly associated with correct patches. Our results suggest that current tools are weak at generating multi-line patches and synthesising new code especially when wrapping existing code. Our results highlight potential future areas of development for new APR approaches, such as developing a tool that effectively repairs defects that require a try/catch block. Our replication Package is available online11Replication Package available at: https://github.com/IncorrectDefects/ReplicationPackage.","PeriodicalId":131481,"journal":{"name":"2022 IEEE/ACM International Workshop on Automated Program Repair (APR)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116126913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Revisiting Object Similarity-based Patch Ranking in Automated Program Repair: An Extensive Study 基于重访对象相似性的自动程序修复补丁排序:一个广泛的研究
Pub Date : 2022-05-01 DOI: 10.1145/3524459.3527354
Ali Ghanbari
Test-based generate-and-validate automated program repair (APR) systems often generate plausible patches that pass the test suite without fixing the bug. So far, several approaches for automatic assessment of the APR-generated patches are proposed. Among them, dynamic patch correctness assessment relies on comparing run-time information obtained from the program before and after patching. Object similarity-based dynamic patch ranking approaches, specifically, capture system state snapshots after the impact point of patches and express behavior differences in term of object graphs similarities. Dynamic approaches rely on the assumption that, when running the originally passing test cases, the correct patches will not alter the program behavior in a significant way, but such patches will significantly change program behavior for the failing test cases. This paper presents the results of an extensive empirical study on two object similarity-based approaches, i.e., ObjSim and CIP, to rank 1,290 APR-generated patches, used in previous APR research. We found that although ObjSim outperforms CIP, in terms of the number of patches ranked in top-1 position, it still does not offer an improvement over random baseline ranking, representing the setting with no automatic patch correctness assessment in place. This observation warrants further research on the validity of the assumptions underlying these two techniques and the techniques based on similar assumptions.
基于测试的生成和验证自动程序修复(APR)系统通常生成可信的补丁,这些补丁通过了测试套件,而没有修复错误。到目前为止,已经提出了几种自动评估apr生成补丁的方法。其中,动态补丁正确性评估依赖于比较补丁前后从程序中获得的运行时信息。基于对象相似度的动态补丁排序方法,具体来说是捕捉补丁撞击点后的系统状态快照,用对象图相似度表示行为差异。动态方法依赖于这样的假设:当运行最初通过的测试用例时,正确的补丁不会显著地改变程序行为,但是这样的补丁将显著地改变失败测试用例的程序行为。本文采用ObjSim和CIP两种基于对象相似性的方法对1290个APR生成的补丁进行了排序,并对以往的APR研究进行了广泛的实证研究。我们发现,尽管ObjSim在排名前1位的补丁数量方面优于CIP,但它仍然没有提供随机基线排名的改进,这代表了没有自动补丁正确性评估的设置。这一观察结果值得进一步研究这两种技术背后的假设以及基于类似假设的技术的有效性。
{"title":"Revisiting Object Similarity-based Patch Ranking in Automated Program Repair: An Extensive Study","authors":"Ali Ghanbari","doi":"10.1145/3524459.3527354","DOIUrl":"https://doi.org/10.1145/3524459.3527354","url":null,"abstract":"Test-based generate-and-validate automated program repair (APR) systems often generate plausible patches that pass the test suite without fixing the bug. So far, several approaches for automatic assessment of the APR-generated patches are proposed. Among them, dynamic patch correctness assessment relies on comparing run-time information obtained from the program before and after patching. Object similarity-based dynamic patch ranking approaches, specifically, capture system state snapshots after the impact point of patches and express behavior differences in term of object graphs similarities. Dynamic approaches rely on the assumption that, when running the originally passing test cases, the correct patches will not alter the program behavior in a significant way, but such patches will significantly change program behavior for the failing test cases. This paper presents the results of an extensive empirical study on two object similarity-based approaches, i.e., ObjSim and CIP, to rank 1,290 APR-generated patches, used in previous APR research. We found that although ObjSim outperforms CIP, in terms of the number of patches ranked in top-1 position, it still does not offer an improvement over random baseline ranking, representing the setting with no automatic patch correctness assessment in place. This observation warrants further research on the validity of the assumptions underlying these two techniques and the techniques based on similar assumptions.","PeriodicalId":131481,"journal":{"name":"2022 IEEE/ACM International Workshop on Automated Program Repair (APR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126448478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2022 IEEE/ACM International Workshop on Automated Program Repair (APR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1