2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)最新文献_第8页

A Differential Testing Approach for Evaluating Abstract Syntax Tree Mapping Algorithms 一种评价抽象语法树映射算法的差分测试方法

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-02-27 DOI: 10.1109/ICSE43902.2021.00108

Yuanrui Fan, Xin Xia, David Lo, A. Hassan, Yuan Wang, Shanping Li

Abstract syntax tree (AST) mapping algorithms are widely used to analyze changes in source code. Despite the foundational role of AST mapping algorithms, little effort has been made to evaluate the accuracy of AST mapping algorithms, i.e., the extent to which an algorithm captures the evolution of code. We observe that a program element often has only one best-mapped program element. Based on this observation, we propose a hierarchical approach to automatically compare the similarity of mapped statements and tokens by different algorithms. By performing the comparison, we determine if each of the compared algorithms generates inaccurate mappings for a statement or its tokens. We invite 12 external experts to determine if three commonly used AST mapping algorithms generate accurate mappings for a statement and its tokens for 200 statements. Based on the experts' feedback, we observe that our approach achieves a precision of 0.98–1.00 and a recall of 0.65–0.75. Furthermore, we conduct a large-scale study with a dataset of ten Java projects containing a total of 263,165 file revisions. Our approach determines that GumTree, MTDiff and IJM generate inaccurate mappings for 20%–29%, 25%–36% and 21%–30% of the file revisions, respectively. Our experimental results show that state-of-the-art AST mapping algorithms still need improvements.

抽象语法树(AST)映射算法被广泛用于分析源代码的变化。尽管AST映射算法具有基础作用，但很少有人努力评估AST映射算法的准确性，即算法捕获代码演变的程度。我们观察到一个程序元素通常只有一个最佳映射的程序元素。基于这一观察，我们提出了一种分层方法，通过不同的算法自动比较映射语句和标记的相似性。通过执行比较，我们确定每个比较算法是否为语句或其标记生成不准确的映射。我们邀请了12位外部专家来确定三种常用的AST映射算法是否为一条语句及其标记为200条语句生成准确的映射。根据专家的反馈，我们观察到我们的方法达到了0.98-1.00的精度和0.65-0.75的召回率。此外，我们对10个Java项目的数据集进行了大规模研究，其中总共包含263,165个文件修订。我们的方法确定，GumTree、MTDiff和IJM分别为20%-29%、25%-36%和21%-30%的文件修订生成不准确的映射。我们的实验结果表明，最先进的AST映射算法仍然需要改进。

{"title":"A Differential Testing Approach for Evaluating Abstract Syntax Tree Mapping Algorithms","authors":"Yuanrui Fan, Xin Xia, David Lo, A. Hassan, Yuan Wang, Shanping Li","doi":"10.1109/ICSE43902.2021.00108","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00108","url":null,"abstract":"Abstract syntax tree (AST) mapping algorithms are widely used to analyze changes in source code. Despite the foundational role of AST mapping algorithms, little effort has been made to evaluate the accuracy of AST mapping algorithms, i.e., the extent to which an algorithm captures the evolution of code. We observe that a program element often has only one best-mapped program element. Based on this observation, we propose a hierarchical approach to automatically compare the similarity of mapped statements and tokens by different algorithms. By performing the comparison, we determine if each of the compared algorithms generates inaccurate mappings for a statement or its tokens. We invite 12 external experts to determine if three commonly used AST mapping algorithms generate accurate mappings for a statement and its tokens for 200 statements. Based on the experts' feedback, we observe that our approach achieves a precision of 0.98–1.00 and a recall of 0.65–0.75. Furthermore, we conduct a large-scale study with a dataset of ten Java projects containing a total of 263,165 file revisions. Our approach determines that GumTree, MTDiff and IJM generate inaccurate mappings for 20%–29%, 25%–36% and 21%–30% of the file revisions, respectively. Our experimental results show that state-of-the-art AST mapping algorithms still need improvements.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130796215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Unrealizable Cores for Reactive Systems Specifications 响应性系统规范的不可实现核心

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-02-27 DOI: 10.1109/ICSE43902.2021.00016

S. Maoz, Rafi Shalom

One of the main challenges of reactive synthesis, an automated procedure to obtain a correct-by-construction reactive system, is to deal with unrealizable specifications. One means to deal with unrealizability, in the context of GR(1), an expressive assume-guarantee fragment of LTL that enables efficient synthesis, is the computation of an unrealizable core, which can be viewed as a fault-localization approach. Existing solutions, however, are computationally costly, are limited to computing a single core, and do not correctly support specifications with constructs beyond pure GR(1) elements. In this work we address these limitations. First, we present QuickCore, a novel algorithm that accelerates unrealizable core computations by relying on the monotonicity of unrealizability, on an incremental computation, and on additional properties of GR(1) specifications. Second, we present Punch, a novel algorithm to efficiently compute all unrealizable cores of a specification. Finally, we present means to correctly handle specifications that include higher-level constructs beyond pure GR(1) elements. We implemented our ideas on top of Spectra, an open-source language and synthesis environment. Our evaluation over benchmarks from the literature shows that QuickCore is in most cases faster than previous algorithms, and that its relative advantage grows with scale. Moreover, we found that most specifications include more than one core, and that Punch finds all the cores significantly faster than a competing naive algorithm.

反应合成是一种获得结构正确的反应体系的自动化过程，其主要挑战之一是处理无法实现的规格。在GR(1)的背景下，处理不可实现性的一种方法是计算不可实现核心，这可以被视为一种故障定位方法。GR(1)是LTL的一个具有表现力的假设保证片段，可以实现有效的合成。然而，现有的解决方案计算成本高，仅限于计算单个核心，并且不能正确支持纯GR(1)元素之外的构造规范。在这项工作中，我们解决了这些限制。首先，我们提出了QuickCore，这是一种新的算法，通过依赖于不可实现性的单调性，增量计算和GR(1)规范的附加属性来加速不可实现核的计算。其次，我们提出了一种新的算法Punch，它可以有效地计算一个规范的所有不可实现的核心。最后，我们提出了正确处理包含纯GR(1)元素以外的高级构造的规范的方法。我们在开源语言和合成环境Spectra上实现了我们的想法。我们对文献基准的评估表明，QuickCore在大多数情况下比以前的算法更快，并且其相对优势随着规模的增长而增长。此外，我们发现大多数规范包括多个核心，而且Punch找到所有核心的速度明显快于竞争对手的朴素算法。

{"title":"Unrealizable Cores for Reactive Systems Specifications","authors":"S. Maoz, Rafi Shalom","doi":"10.1109/ICSE43902.2021.00016","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00016","url":null,"abstract":"One of the main challenges of reactive synthesis, an automated procedure to obtain a correct-by-construction reactive system, is to deal with unrealizable specifications. One means to deal with unrealizability, in the context of GR(1), an expressive assume-guarantee fragment of LTL that enables efficient synthesis, is the computation of an unrealizable core, which can be viewed as a fault-localization approach. Existing solutions, however, are computationally costly, are limited to computing a single core, and do not correctly support specifications with constructs beyond pure GR(1) elements. In this work we address these limitations. First, we present QuickCore, a novel algorithm that accelerates unrealizable core computations by relying on the monotonicity of unrealizability, on an incremental computation, and on additional properties of GR(1) specifications. Second, we present Punch, a novel algorithm to efficiently compute all unrealizable cores of a specification. Finally, we present means to correctly handle specifications that include higher-level constructs beyond pure GR(1) elements. We implemented our ideas on top of Spectra, an open-source language and synthesis environment. Our evaluation over benchmarks from the literature shows that QuickCore is in most cases faster than previous algorithms, and that its relative advantage grows with scale. Moreover, we found that most specifications include more than one core, and that Punch finds all the cores significantly faster than a competing naive algorithm.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"187 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121359595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Extracting Concise Bug-Fixing Patches from Human-Written Patches in Version Control Systems 从版本控制系统中人工编写的补丁中提取简明的bug修复补丁

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-02-27 DOI: 10.1109/ICSE43902.2021.00069

Yanjie Jiang, Hui Liu, Nan Niu, Lu Zhang, Yamin Hu

High-quality and large-scale repositories of real bugs and their concise patches collected from real-world applications are critical for research in software engineering community. In such a repository, each real bug is explicitly associated with its fix. Therefore, on one side, the real bugs and their fixes may inspire novel approaches for finding, locating, and repairing software bugs; on the other side, the real bugs and their fixes are indispensable for rigorous and meaningful evaluation of approaches for software testing, fault localization, and program repair. To this end, a number of such repositories, e.g., Defects4J, have been proposed. However, such repositories are rather small because their construction involves expensive human intervention. Although bug-fixing code commits as well as associated test cases could be retrieved from version control systems automatically, existing approaches could not yet automatically extract concise bug-fixing patches from bug-fixing commits because such commits often involve bug-irrelevant changes. In this paper, we propose an automatic approach, called BugBuilder, to extracting complete and concise bug-fixing patches from human-written patches in version control systems. It excludes refactorings by detecting refactorings involved in bug-fixing commits, and reapplying detected refactorings on the faulty version. It enumerates all subsets of the remaining part and validates them on test cases. If none of the subsets has the potential to be a complete bug-fixing patch, the remaining part as a whole is taken as a complete and concise bug-fixing patch. Evaluation results on 809 real bug-fixing commits in Defects4J suggest that BugBuilder successfully generated complete and concise bug-fixing patches for forty percent of the bug-fixing commits, and its precision (99%) was even higher than human experts.

从实际应用程序中收集的高质量和大规模的真实错误存储库及其简明补丁对于软件工程社区的研究至关重要。在这样的存储库中，每个真正的bug都显式地与其修复相关联。因此，一方面，真实的错误和它们的修复可能会激发寻找、定位和修复软件错误的新方法;另一方面，对于软件测试、故障定位和程序修复方法的严格和有意义的评估来说，真实的错误及其修复是不可或缺的。为此，已经提出了许多这样的存储库，例如，Defects4J。然而，这样的存储库相当小，因为它们的构建涉及昂贵的人工干预。虽然bug修复代码提交以及相关的测试用例可以自动地从版本控制系统中检索，但是现有的方法还不能自动地从bug修复提交中提取简洁的bug修复补丁，因为这样的提交通常涉及与bug无关的更改。在本文中，我们提出了一种称为BugBuilder的自动方法，用于从版本控制系统中人工编写的补丁中提取完整而简洁的bug修复补丁。它通过检测修复错误提交中涉及的重构，并在有问题的版本上重新应用检测到的重构，从而排除重构。它列举剩余部分的所有子集，并在测试用例上验证它们。如果这些子集中没有一个具有成为完整的bug修复补丁的潜力，那么其余部分作为一个整体将被视为一个完整而简洁的bug修复补丁。对缺陷4j中809个真正的bug修复提交的评估结果表明，BugBuilder成功地为40%的bug修复提交生成了完整而简洁的bug修复补丁，其精度(99%)甚至高于人类专家。

{"title":"Extracting Concise Bug-Fixing Patches from Human-Written Patches in Version Control Systems","authors":"Yanjie Jiang, Hui Liu, Nan Niu, Lu Zhang, Yamin Hu","doi":"10.1109/ICSE43902.2021.00069","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00069","url":null,"abstract":"High-quality and large-scale repositories of real bugs and their concise patches collected from real-world applications are critical for research in software engineering community. In such a repository, each real bug is explicitly associated with its fix. Therefore, on one side, the real bugs and their fixes may inspire novel approaches for finding, locating, and repairing software bugs; on the other side, the real bugs and their fixes are indispensable for rigorous and meaningful evaluation of approaches for software testing, fault localization, and program repair. To this end, a number of such repositories, e.g., Defects4J, have been proposed. However, such repositories are rather small because their construction involves expensive human intervention. Although bug-fixing code commits as well as associated test cases could be retrieved from version control systems automatically, existing approaches could not yet automatically extract concise bug-fixing patches from bug-fixing commits because such commits often involve bug-irrelevant changes. In this paper, we propose an automatic approach, called BugBuilder, to extracting complete and concise bug-fixing patches from human-written patches in version control systems. It excludes refactorings by detecting refactorings involved in bug-fixing commits, and reapplying detected refactorings on the faulty version. It enumerates all subsets of the remaining part and validates them on test cases. If none of the subsets has the potential to be a complete bug-fixing patch, the remaining part as a whole is taken as a complete and concise bug-fixing patch. Evaluation results on 809 real bug-fixing commits in Defects4J suggest that BugBuilder successfully generated complete and concise bug-fixing patches for forty percent of the bug-fixing commits, and its precision (99%) was even higher than human experts.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126422357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

A Context-Based Automated Approach for Method Name Consistency Checking and Suggestion 一种基于上下文的方法名称一致性检查和建议的自动化方法

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-02-27 DOI: 10.1109/ICSE43902.2021.00060

Yi Li, Shaohua Wang, T. Nguyen

Misleading method names in software projects can confuse developers, which may lead to software defects and affect code understandability. In this paper, we present DeepName, a context-based, deep learning approach to detect method name inconsistencies and suggest a proper name for a method. The key departure point is the philosophy of "Show Me Your Friends, I'll Tell You Who You Are". Unlike the state-of-the-art approaches, in addition to the method's body, we also consider the interactions of the current method under study with the other ones including the caller and callee methods, and the sibling methods in the same enclosing class. The sequences of sub-tokens in the program entities' names in the contexts are extracted and used as the input for an RNN-based encoder-decoder to produce the representations for the current method. We modify that RNN model to integrate the copy mechanism and our newly developed component, called the non-copy mechanism, to emphasize on the possibility of a certain sub-token not to be copied to follow the current sub-token in the currently generated method name. We conducted several experiments to evaluate DeepName on large datasets with +14M methods. For consistency checking, DeepName improves the state-of-the-art approach by 2.1%, 19.6%, and 11.9% relatively in recall, precision, and F-score, respectively. For name suggestion, DeepName improves relatively over the state-of-the-art approaches in precision (1.8%–30.5%), recall (8.8%–46.1%), and F-score (5.2%–38.2%). To assess DeepName's usefulness, we detected inconsistent methods and suggested new method names in active projects. Among 50 pull requests, 12 were merged into the main branch. In total, in 30/50 cases, the team members agree that our suggested method names are more meaningful than the current names.

在软件项目中，误导性的方法名称会使开发人员感到困惑，这可能导致软件缺陷并影响代码的可理解性。在本文中，我们提出了DeepName，这是一种基于上下文的深度学习方法，用于检测方法名称不一致并为方法建议合适的名称。关键的出发点是“给我看你的朋友，我会告诉你你是谁”的哲学。与最先进的方法不同，除了方法主体之外，我们还考虑正在研究的当前方法与其他方法的交互，包括调用者和被调用者方法，以及同一封闭类中的兄弟方法。提取上下文中程序实体名称中的子标记序列，并将其用作基于rnn的编码器-解码器的输入，以生成当前方法的表示。我们修改了该RNN模型，将复制机制和我们新开发的组件(称为非复制机制)集成在一起，以强调在当前生成的方法名称中不复制某个子令牌以跟随当前子令牌的可能性。我们使用+14M方法在大型数据集上进行了多次实验来评估DeepName。对于一致性检查，DeepName在召回率、准确率和f分数方面分别提高了2.1%、19.6%和11.9%。对于名字建议，DeepName在准确率(1.8%-30.5%)、召回率(8.8%-46.1%)和f分数(5.2%-38.2%)方面相对于最先进的方法有所提高。为了评估DeepName的有用性，我们检测了不一致的方法，并在活动项目中提出了新的方法名称。在50个拉取请求中，有12个被合并到主分支中。总的来说，在30/50的情况下，团队成员同意我们建议的方法名比当前的名称更有意义。

{"title":"A Context-Based Automated Approach for Method Name Consistency Checking and Suggestion","authors":"Yi Li, Shaohua Wang, T. Nguyen","doi":"10.1109/ICSE43902.2021.00060","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00060","url":null,"abstract":"Misleading method names in software projects can confuse developers, which may lead to software defects and affect code understandability. In this paper, we present DeepName, a context-based, deep learning approach to detect method name inconsistencies and suggest a proper name for a method. The key departure point is the philosophy of \"Show Me Your Friends, I'll Tell You Who You Are\". Unlike the state-of-the-art approaches, in addition to the method's body, we also consider the interactions of the current method under study with the other ones including the caller and callee methods, and the sibling methods in the same enclosing class. The sequences of sub-tokens in the program entities' names in the contexts are extracted and used as the input for an RNN-based encoder-decoder to produce the representations for the current method. We modify that RNN model to integrate the copy mechanism and our newly developed component, called the non-copy mechanism, to emphasize on the possibility of a certain sub-token not to be copied to follow the current sub-token in the currently generated method name. We conducted several experiments to evaluate DeepName on large datasets with +14M methods. For consistency checking, DeepName improves the state-of-the-art approach by 2.1%, 19.6%, and 11.9% relatively in recall, precision, and F-score, respectively. For name suggestion, DeepName improves relatively over the state-of-the-art approaches in precision (1.8%–30.5%), recall (8.8%–46.1%), and F-score (5.2%–38.2%). To assess DeepName's usefulness, we detected inconsistent methods and suggested new method names in active projects. Among 50 pull requests, 12 were merged into the main branch. In total, in 30/50 cases, the team members agree that our suggested method names are more meaningful than the current names.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124835131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Bounded Exhaustive Search of Alloy Specification Repairs 合金规格修理的有界穷举搜索

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-02-27 DOI: 10.1109/ICSE43902.2021.00105

Simón Gutiérrez Brida, Germán Regis, Guolong Zheng, H. Bagheri, Thanhvu Nguyen, Nazareno Aguirre, M. Frias

The rising popularity of declarative languages and the hard to debug nature thereof have motivated the need for applicable, automated repair techniques for such languages. However, despite significant advances in the program repair of imperative languages, there is a dearth of repair techniques for declarative languages. This paper presents BeAFix, an automated repair technique for faulty models written in Alloy, a declarative language based on first-order relational logic. BeAFix is backed with a novel strategy for bounded exhaustive, yet scalable, exploration of the spaces of fix candidates and a formally rigorous, sound pruning of such spaces. Moreover, different from the state-of-the-art in Alloy automated repair, that relies on the availability of unit tests, BeAFix does not require tests and can work with assertions that are naturally used in formal declarative languages. Our experience with using BeAFix to repair thousands of real-world faulty models, collected by other researchers, corroborates its ability to effectively generate correct repairs and outperform the state-of-the-art.

声明性语言的日益流行及其难以调试的特性促使人们需要适用于此类语言的自动修复技术。然而，尽管命令式语言的程序修复取得了重大进展，但声明性语言的修复技术仍然缺乏。BeAFix是一种基于一阶关系逻辑的声明性语言Alloy编写的故障模型自动修复技术。BeAFix支持一种新颖的策略，用于对固定候选空间进行有界的详尽但可扩展的探索，并对这些空间进行正式的严格、合理的修剪。此外，与Alloy自动化修复中依赖于单元测试可用性的最新技术不同，BeAFix不需要测试，并且可以处理在正式声明性语言中自然使用的断言。我们使用BeAFix修复由其他研究人员收集的数千个现实世界中的故障模型的经验，证实了它能够有效地生成正确的修复，并且优于最先进的技术。

{"title":"Bounded Exhaustive Search of Alloy Specification Repairs","authors":"Simón Gutiérrez Brida, Germán Regis, Guolong Zheng, H. Bagheri, Thanhvu Nguyen, Nazareno Aguirre, M. Frias","doi":"10.1109/ICSE43902.2021.00105","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00105","url":null,"abstract":"The rising popularity of declarative languages and the hard to debug nature thereof have motivated the need for applicable, automated repair techniques for such languages. However, despite significant advances in the program repair of imperative languages, there is a dearth of repair techniques for declarative languages. This paper presents BeAFix, an automated repair technique for faulty models written in Alloy, a declarative language based on first-order relational logic. BeAFix is backed with a novel strategy for bounded exhaustive, yet scalable, exploration of the spaces of fix candidates and a formally rigorous, sound pruning of such spaces. Moreover, different from the state-of-the-art in Alloy automated repair, that relies on the availability of unit tests, BeAFix does not require tests and can work with assertions that are naturally used in formal declarative languages. Our experience with using BeAFix to repair thousands of real-world faulty models, collected by other researchers, corroborates its ability to effectively generate correct repairs and outperform the state-of-the-art.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130259050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Fault Localization with Code Coverage Representation Learning 基于代码覆盖表示学习的故障定位

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-02-27 DOI: 10.1109/ICSE43902.2021.00067

Yi Li, Shaohua Wang, T. Nguyen

In this paper, we propose DeepRL4FL, a deep learning fault localization (FL) approach that locates the buggy code at the statement and method levels by treating FL as an image pattern recognition problem. DeepRL4FL does so via novel code coverage representation learning (RL) and data dependencies RL for program statements. Those two types of RL on the dynamic information in a code coverage matrix are also combined with the code representation learning on the static information of the usual suspicious source code. This combination is inspired by crime scene investigation in which investigators analyze the crime scene (failed test cases and statements) and related persons (statements with dependencies), and at the same time, examine the usual suspects who have committed a similar crime in the past (similar buggy code in the training data). For the code coverage information, DeepRL4FL first orders the test cases and marks error-exhibiting code statements, expecting that a model can recognize the patterns discriminating between faulty and non-faulty statements/methods. For dependencies among statements, the suspiciousness of a statement is seen taking into account the data dependencies to other statements in execution and data flows, in addition to the statement by itself. Finally, the vector representations for code coverage matrix, data dependencies among statements, and source code are combined and used as the input of a classifier built from a Convolution Neural Network to detect buggy statements/methods. Our empirical evaluation shows that DeepRL4FL improves the top-1 results over the state-of-the-art statement-level FL baselines from 173.1% to 491.7%. It also improves the top-1 results over the existing method-level FL baselines from 15.0% to 206.3%.

在本文中，我们提出了DeepRL4FL，这是一种深度学习错误定位(FL)方法，通过将错误定位视为图像模式识别问题，在语句和方法级别定位错误代码。DeepRL4FL通过新颖的代码覆盖表示学习(RL)和程序语句的数据依赖关系RL来实现这一点。这两种基于代码覆盖矩阵中动态信息的强化学习也与基于通常可疑源代码的静态信息的代码表示学习相结合。这种组合受到犯罪现场调查的启发，在犯罪现场调查中，调查人员分析犯罪现场(失败的测试用例和语句)和相关人员(具有依赖性的语句)，同时检查过去犯下类似罪行的通常嫌疑人(训练数据中类似的错误代码)。对于代码覆盖信息，DeepRL4FL首先对测试用例进行排序，并标记显示错误的代码语句，期望模型能够识别出区分错误和非错误语句/方法的模式。对于语句之间的依赖性，除了语句本身之外，还考虑到语句在执行过程中对其他语句和数据流的数据依赖性。最后，将代码覆盖率矩阵、语句之间的数据依赖关系和源代码的向量表示组合起来，并用作由卷积神经网络构建的分类器的输入，以检测有缺陷的语句/方法。我们的实证评估表明，DeepRL4FL将最先进的语句级FL基线的前1名结果从173.1%提高到491.7%。它还将现有方法级FL基线的前1名结果从15.0%提高到206.3%。

{"title":"Fault Localization with Code Coverage Representation Learning","authors":"Yi Li, Shaohua Wang, T. Nguyen","doi":"10.1109/ICSE43902.2021.00067","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00067","url":null,"abstract":"In this paper, we propose DeepRL4FL, a deep learning fault localization (FL) approach that locates the buggy code at the statement and method levels by treating FL as an image pattern recognition problem. DeepRL4FL does so via novel code coverage representation learning (RL) and data dependencies RL for program statements. Those two types of RL on the dynamic information in a code coverage matrix are also combined with the code representation learning on the static information of the usual suspicious source code. This combination is inspired by crime scene investigation in which investigators analyze the crime scene (failed test cases and statements) and related persons (statements with dependencies), and at the same time, examine the usual suspects who have committed a similar crime in the past (similar buggy code in the training data). For the code coverage information, DeepRL4FL first orders the test cases and marks error-exhibiting code statements, expecting that a model can recognize the patterns discriminating between faulty and non-faulty statements/methods. For dependencies among statements, the suspiciousness of a statement is seen taking into account the data dependencies to other statements in execution and data flows, in addition to the statement by itself. Finally, the vector representations for code coverage matrix, data dependencies among statements, and source code are combined and used as the input of a classifier built from a Convolution Neural Network to detect buggy statements/methods. Our empirical evaluation shows that DeepRL4FL improves the top-1 results over the state-of-the-art statement-level FL baselines from 173.1% to 491.7%. It also improves the top-1 results over the existing method-level FL baselines from 15.0% to 206.3%.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"230 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115171837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 55

EvoSpex: An Evolutionary Algorithm for Learning Postconditions EvoSpex:一种学习后置条件的进化算法

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-02-26 DOI: 10.1109/ICSE43902.2021.00112

F. Molina, Pablo Ponzio, Nazareno Aguirre, M. Frias

Software reliability is a primary concern in the construction of software, and thus a fundamental component in the definition of software quality. Analyzing software reliability requires a specification of the intended behavior of the software under analysis, and at the source code level, such specifications typically take the form of assertions. Unfortunately, software many times lacks such specifications, or only provides them for scenario-specific behaviors, as assertions accompanying tests. This issue seriously diminishes the analyzability of software with respect to its reliability. In this paper, we tackle this problem by proposing a technique that, given a Java method, automatically produces a specification of the method's current behavior, in the form of postcondition assertions. This mechanism is based on generating executions of the method under analysis to obtain valid pre/post state pairs, mutating these pairs to obtain (allegedly) invalid ones, and then using a genetic algorithm to produce an assertion that is satisfied by the valid pre/post pairs, while leaving out the invalid ones. The technique, which targets in particular methods of reference-based class implementations, is assessed on a benchmark of open source Java projects, showing that our genetic algorithm is able to generate post-conditions that are stronger and more accurate, than those generated by related automated approaches, as evaluated by an automated oracle assessment tool. Moreover, our technique is also able to infer an important part of manually written rich postconditions in verified classes, and reproduce contracts for methods whose class implementations were automatically synthesized from specifications.

软件可靠性是软件构建中的主要关注点，因此也是软件质量定义中的基本组成部分。分析软件可靠性需要对被分析软件的预期行为进行规范，并且在源代码级别，此类规范通常采用断言的形式。不幸的是，软件很多时候缺乏这样的规范，或者只为场景特定的行为提供规范，比如伴随测试的断言。这个问题严重削弱了软件在可靠性方面的可分析性。在本文中，我们通过提出一种技术来解决这个问题，该技术可以在给定Java方法的情况下，以后置条件断言的形式自动生成方法当前行为的规范。该机制基于生成所分析的方法的执行，以获得有效的前/后状态对，改变这些对以获得(据称)无效的状态对，然后使用遗传算法生成由有效的前/后状态对满足的断言，同时忽略无效的状态对。该技术以基于引用的类实现的特定方法为目标，在开源Java项目的基准上进行了评估，结果表明，我们的遗传算法能够生成比相关自动化方法生成的后置条件更强、更准确，正如自动化oracle评估工具所评估的那样。此外，我们的技术还能够推断出经过验证的类中手工编写的丰富后验条件的重要部分，并为其类实现从规范中自动合成的方法再现契约。

{"title":"EvoSpex: An Evolutionary Algorithm for Learning Postconditions","authors":"F. Molina, Pablo Ponzio, Nazareno Aguirre, M. Frias","doi":"10.1109/ICSE43902.2021.00112","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00112","url":null,"abstract":"Software reliability is a primary concern in the construction of software, and thus a fundamental component in the definition of software quality. Analyzing software reliability requires a specification of the intended behavior of the software under analysis, and at the source code level, such specifications typically take the form of assertions. Unfortunately, software many times lacks such specifications, or only provides them for scenario-specific behaviors, as assertions accompanying tests. This issue seriously diminishes the analyzability of software with respect to its reliability. In this paper, we tackle this problem by proposing a technique that, given a Java method, automatically produces a specification of the method's current behavior, in the form of postcondition assertions. This mechanism is based on generating executions of the method under analysis to obtain valid pre/post state pairs, mutating these pairs to obtain (allegedly) invalid ones, and then using a genetic algorithm to produce an assertion that is satisfied by the valid pre/post pairs, while leaving out the invalid ones. The technique, which targets in particular methods of reference-based class implementations, is assessed on a benchmark of open source Java projects, showing that our genetic algorithm is able to generate post-conditions that are stronger and more accurate, than those generated by related automated approaches, as evaluated by an automated oracle assessment tool. Moreover, our technique is also able to infer an important part of manually written rich postconditions in verified classes, and reproduce contracts for methods whose class implementations were automatically synthesized from specifications.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"10 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131663888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

CURE: Code-Aware Neural Machine Translation for Automatic Program Repair 用于自动程序修复的代码感知神经机器翻译

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-02-26 DOI: 10.1109/ICSE43902.2021.00107

Nan Jiang, Thibaud Lutellier, Lin Tan

Automatic program repair (APR) is crucial to improve software reliability. Recently, neural machine translation (NMT) techniques have been used to automatically fix software bugs. While promising, these approaches have two major limitations. Their search space often does not contain the correct fix, and their search strategy ignores software knowledge such as strict code syntax. Due to these limitations, existing NMT-based techniques underperform the best template-based approaches. We propose CURE, a new NMT-based APR technique with three major novelties. First, CURE pre-trains a programming language (PL) model on a large software codebase to learn developer-like source code before the APR task. Second, CURE designs a new code-aware search strategy that finds more correct fixes by focusing on searching for compilable patches and patches that are close in length to the buggy code. Finally, CURE uses a subword tokenization technique to generate a smaller search space that contains more correct fixes. Our evaluation on two widely-used benchmarks shows that CURE correctly fixes 57 Defects4J bugs and 26 QuixBugs bugs, outperforming all existing APR techniques on both benchmarks.

自动程序修复(APR)是提高软件可靠性的关键。最近，神经机器翻译(NMT)技术被用于自动修复软件错误。虽然这些方法很有希望，但它们有两个主要的局限性。它们的搜索空间通常不包含正确的修复，而且它们的搜索策略忽略了诸如严格的代码语法之类的软件知识。由于这些限制，现有的基于nmt的技术不如基于模板的最佳方法。我们提出了一种新的基于nmt的APR技术，它有三个主要的新颖之处。首先，CURE在大型软件代码库上预训练编程语言(PL)模型，以便在APR任务之前学习类似于开发人员的源代码。其次，CURE设计了一种新的代码感知搜索策略，通过专注于搜索可编译补丁和长度接近错误代码的补丁来找到更多正确的修复。最后，CURE使用子词标记技术生成包含更多正确修复的更小的搜索空间。我们对两个广泛使用的基准测试的评估表明，CURE正确地修复了57个缺陷4j错误和26个QuixBugs错误，在这两个基准测试中都优于所有现有的APR技术。

{"title":"CURE: Code-Aware Neural Machine Translation for Automatic Program Repair","authors":"Nan Jiang, Thibaud Lutellier, Lin Tan","doi":"10.1109/ICSE43902.2021.00107","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00107","url":null,"abstract":"Automatic program repair (APR) is crucial to improve software reliability. Recently, neural machine translation (NMT) techniques have been used to automatically fix software bugs. While promising, these approaches have two major limitations. Their search space often does not contain the correct fix, and their search strategy ignores software knowledge such as strict code syntax. Due to these limitations, existing NMT-based techniques underperform the best template-based approaches. We propose CURE, a new NMT-based APR technique with three major novelties. First, CURE pre-trains a programming language (PL) model on a large software codebase to learn developer-like source code before the APR task. Second, CURE designs a new code-aware search strategy that finds more correct fixes by focusing on searching for compilable patches and patches that are close in length to the buggy code. Finally, CURE uses a subword tokenization technique to generate a smaller search space that contains more correct fixes. Our evaluation on two widely-used benchmarks shows that CURE correctly fixes 57 Defects4J bugs and 26 QuixBugs bugs, outperforming all existing APR techniques on both benchmarks.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125254840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 168

Distribution-Aware Testing of Neural Networks Using Generative Models 基于生成模型的神经网络分布感知测试

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-02-26 DOI: 10.1109/ICSE43902.2021.00032

Swaroopa Dola, Matthew B. Dwyer, M. Soffa

The reliability of software that has a Deep Neural Network (DNN) as a component is urgently important today given the increasing number of critical applications being deployed with DNNs. The need for reliability raises a need for rigorous testing of the safety and trustworthiness of these systems. In the last few years, there have been a number of research efforts focused on testing DNNs. However the test generation techniques proposed so far lack a check to determine whether the test inputs they are generating are valid, and thus invalid inputs are produced. To illustrate this situation, we explored three recent DNN testing techniques. Using deep generative model based input validation, we show that all the three techniques generate significant number of invalid test inputs. We further analyzed the test coverage achieved by the test inputs generated by the DNN testing techniques and showed how invalid test inputs can falsely inflate test coverage metrics. To overcome the inclusion of invalid inputs in testing, we propose a technique to incorporate the valid input space of the DNN model under test in the test generation process. Our technique uses a deep generative model-based algorithm to generate only valid inputs. Results of our empirical studies show that our technique is effective in eliminating invalid tests and boosting the number of valid test inputs generated.

考虑到越来越多的关键应用正在部署深度神经网络(DNN)，以深度神经网络(DNN)为组件的软件的可靠性在今天非常重要。对可靠性的需求提出了对这些系统的安全性和可信度进行严格测试的需求。在过去的几年里，已经有一些研究工作集中在测试dnn上。然而，到目前为止提出的测试生成技术缺乏检查来确定它们生成的测试输入是否有效，因此产生了无效的输入。为了说明这种情况，我们探索了三种最新的深度神经网络测试技术。使用基于深度生成模型的输入验证，我们表明所有三种技术都会产生大量无效的测试输入。我们进一步分析了由DNN测试技术生成的测试输入所获得的测试覆盖率，并展示了无效的测试输入如何错误地膨胀测试覆盖率度量。为了克服测试中包含无效输入的问题，我们提出了一种将被测DNN模型的有效输入空间纳入测试生成过程的技术。我们的技术使用基于深度生成模型的算法来生成有效的输入。我们的实证研究结果表明，我们的技术是有效的消除无效测试和增加有效的测试输入生成的数量。

{"title":"Distribution-Aware Testing of Neural Networks Using Generative Models","authors":"Swaroopa Dola, Matthew B. Dwyer, M. Soffa","doi":"10.1109/ICSE43902.2021.00032","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00032","url":null,"abstract":"The reliability of software that has a Deep Neural Network (DNN) as a component is urgently important today given the increasing number of critical applications being deployed with DNNs. The need for reliability raises a need for rigorous testing of the safety and trustworthiness of these systems. In the last few years, there have been a number of research efforts focused on testing DNNs. However the test generation techniques proposed so far lack a check to determine whether the test inputs they are generating are valid, and thus invalid inputs are produced. To illustrate this situation, we explored three recent DNN testing techniques. Using deep generative model based input validation, we show that all the three techniques generate significant number of invalid test inputs. We further analyzed the test coverage achieved by the test inputs generated by the DNN testing techniques and showed how invalid test inputs can falsely inflate test coverage metrics. To overcome the inclusion of invalid inputs in testing, we propose a technique to incorporate the valid input space of the DNN model under test in the test generation process. Our technique uses a deep generative model-based algorithm to generate only valid inputs. Results of our empirical studies show that our technique is effective in eliminating invalid tests and boosting the number of valid test inputs generated.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115111751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 35

On the Naming of Methods: A Survey of Professional Developers 关于方法的命名:对专业开发人员的调查

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-02-26 DOI: 10.1109/ICSE43902.2021.00061

Reem S. Alsuhaibani, Christian D. Newman, M. J. Decker, M. Collard, Jonathan I. Maletic

This paper describes the results of a large (+1100 responses) survey of professional software developers concerning standards for naming source code methods. The various standards for source code method names are derived from and supported in the software engineering literature. The goal of the survey is to determine if there is a general consensus among developers that the standards are accepted and used in practice. Additionally, the paper examines factors such as years of experience and programming language knowledge in the context of survey responses. The survey results show that participants very much agree about the importance of various standards and how they apply to names and that years of experience and the programming language has almost no effect on their responses. The results imply that the given standards are both valid and to a large degree complete. The work provides a foundation for automated method name assessment during development and code reviews.

本文描述了对专业软件开发人员关于命名源代码方法标准的大型(+1100个响应)调查的结果。源代码方法名的各种标准来源于软件工程文献，并得到软件工程文献的支持。调查的目标是确定在开发人员中是否有一个普遍的共识，即标准被接受并在实践中使用。此外，本文还考察了诸如多年经验和编程语言知识等因素在调查回应的背景下。调查结果显示，参与者非常认同各种标准的重要性，以及这些标准如何适用于名字，而且多年的经验和编程语言对他们的回答几乎没有影响。结果表明，所给出的标准是有效的，并且在很大程度上是完整的。这项工作为开发和代码审查期间的自动化方法名称评估提供了基础。

引用次数: 16