Fault Localization with Code Coverage Representation Learning

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE) Pub Date : 2021-02-27 DOI:10.1109/ICSE43902.2021.00067

Yi Li, Shaohua Wang, T. Nguyen

{"title":"Fault Localization with Code Coverage Representation Learning","authors":"Yi Li, Shaohua Wang, T. Nguyen","doi":"10.1109/ICSE43902.2021.00067","DOIUrl":null,"url":null,"abstract":"In this paper, we propose DeepRL4FL, a deep learning fault localization (FL) approach that locates the buggy code at the statement and method levels by treating FL as an image pattern recognition problem. DeepRL4FL does so via novel code coverage representation learning (RL) and data dependencies RL for program statements. Those two types of RL on the dynamic information in a code coverage matrix are also combined with the code representation learning on the static information of the usual suspicious source code. This combination is inspired by crime scene investigation in which investigators analyze the crime scene (failed test cases and statements) and related persons (statements with dependencies), and at the same time, examine the usual suspects who have committed a similar crime in the past (similar buggy code in the training data). For the code coverage information, DeepRL4FL first orders the test cases and marks error-exhibiting code statements, expecting that a model can recognize the patterns discriminating between faulty and non-faulty statements/methods. For dependencies among statements, the suspiciousness of a statement is seen taking into account the data dependencies to other statements in execution and data flows, in addition to the statement by itself. Finally, the vector representations for code coverage matrix, data dependencies among statements, and source code are combined and used as the input of a classifier built from a Convolution Neural Network to detect buggy statements/methods. Our empirical evaluation shows that DeepRL4FL improves the top-1 results over the state-of-the-art statement-level FL baselines from 173.1% to 491.7%. It also improves the top-1 results over the existing method-level FL baselines from 15.0% to 206.3%.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"230 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"55","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSE43902.2021.00067","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 55

Abstract

In this paper, we propose DeepRL4FL, a deep learning fault localization (FL) approach that locates the buggy code at the statement and method levels by treating FL as an image pattern recognition problem. DeepRL4FL does so via novel code coverage representation learning (RL) and data dependencies RL for program statements. Those two types of RL on the dynamic information in a code coverage matrix are also combined with the code representation learning on the static information of the usual suspicious source code. This combination is inspired by crime scene investigation in which investigators analyze the crime scene (failed test cases and statements) and related persons (statements with dependencies), and at the same time, examine the usual suspects who have committed a similar crime in the past (similar buggy code in the training data). For the code coverage information, DeepRL4FL first orders the test cases and marks error-exhibiting code statements, expecting that a model can recognize the patterns discriminating between faulty and non-faulty statements/methods. For dependencies among statements, the suspiciousness of a statement is seen taking into account the data dependencies to other statements in execution and data flows, in addition to the statement by itself. Finally, the vector representations for code coverage matrix, data dependencies among statements, and source code are combined and used as the input of a classifier built from a Convolution Neural Network to detect buggy statements/methods. Our empirical evaluation shows that DeepRL4FL improves the top-1 results over the state-of-the-art statement-level FL baselines from 173.1% to 491.7%. It also improves the top-1 results over the existing method-level FL baselines from 15.0% to 206.3%.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于代码覆盖表示学习的故障定位

在本文中，我们提出了DeepRL4FL，这是一种深度学习错误定位(FL)方法，通过将错误定位视为图像模式识别问题，在语句和方法级别定位错误代码。DeepRL4FL通过新颖的代码覆盖表示学习(RL)和程序语句的数据依赖关系RL来实现这一点。这两种基于代码覆盖矩阵中动态信息的强化学习也与基于通常可疑源代码的静态信息的代码表示学习相结合。这种组合受到犯罪现场调查的启发，在犯罪现场调查中，调查人员分析犯罪现场(失败的测试用例和语句)和相关人员(具有依赖性的语句)，同时检查过去犯下类似罪行的通常嫌疑人(训练数据中类似的错误代码)。对于代码覆盖信息，DeepRL4FL首先对测试用例进行排序，并标记显示错误的代码语句，期望模型能够识别出区分错误和非错误语句/方法的模式。对于语句之间的依赖性，除了语句本身之外，还考虑到语句在执行过程中对其他语句和数据流的数据依赖性。最后，将代码覆盖率矩阵、语句之间的数据依赖关系和源代码的向量表示组合起来，并用作由卷积神经网络构建的分类器的输入，以检测有缺陷的语句/方法。我们的实证评估表明，DeepRL4FL将最先进的语句级FL基线的前1名结果从173.1%提高到491.7%。它还将现有方法级FL基线的前1名结果从15.0%提高到206.3%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

自引率

0.00%

发文量

期刊最新文献

MuDelta: Delta-Oriented Mutation Testing at Commit Time Verifying Determinism in Sequential Programs Data-Oriented Differential Testing of Object-Relational Mapping Systems IoT Bugs and Development Challenges Onboarding vs. Diversity, Productivity and Quality — Empirical Study of the OpenStack Ecosystem