Fault Localization with Code Coverage Representation Learning

Yi Li, Shaohua Wang, T. Nguyen
{"title":"Fault Localization with Code Coverage Representation Learning","authors":"Yi Li, Shaohua Wang, T. Nguyen","doi":"10.1109/ICSE43902.2021.00067","DOIUrl":null,"url":null,"abstract":"In this paper, we propose DeepRL4FL, a deep learning fault localization (FL) approach that locates the buggy code at the statement and method levels by treating FL as an image pattern recognition problem. DeepRL4FL does so via novel code coverage representation learning (RL) and data dependencies RL for program statements. Those two types of RL on the dynamic information in a code coverage matrix are also combined with the code representation learning on the static information of the usual suspicious source code. This combination is inspired by crime scene investigation in which investigators analyze the crime scene (failed test cases and statements) and related persons (statements with dependencies), and at the same time, examine the usual suspects who have committed a similar crime in the past (similar buggy code in the training data). For the code coverage information, DeepRL4FL first orders the test cases and marks error-exhibiting code statements, expecting that a model can recognize the patterns discriminating between faulty and non-faulty statements/methods. For dependencies among statements, the suspiciousness of a statement is seen taking into account the data dependencies to other statements in execution and data flows, in addition to the statement by itself. Finally, the vector representations for code coverage matrix, data dependencies among statements, and source code are combined and used as the input of a classifier built from a Convolution Neural Network to detect buggy statements/methods. Our empirical evaluation shows that DeepRL4FL improves the top-1 results over the state-of-the-art statement-level FL baselines from 173.1% to 491.7%. It also improves the top-1 results over the existing method-level FL baselines from 15.0% to 206.3%.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"230 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"55","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSE43902.2021.00067","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 55

Abstract

In this paper, we propose DeepRL4FL, a deep learning fault localization (FL) approach that locates the buggy code at the statement and method levels by treating FL as an image pattern recognition problem. DeepRL4FL does so via novel code coverage representation learning (RL) and data dependencies RL for program statements. Those two types of RL on the dynamic information in a code coverage matrix are also combined with the code representation learning on the static information of the usual suspicious source code. This combination is inspired by crime scene investigation in which investigators analyze the crime scene (failed test cases and statements) and related persons (statements with dependencies), and at the same time, examine the usual suspects who have committed a similar crime in the past (similar buggy code in the training data). For the code coverage information, DeepRL4FL first orders the test cases and marks error-exhibiting code statements, expecting that a model can recognize the patterns discriminating between faulty and non-faulty statements/methods. For dependencies among statements, the suspiciousness of a statement is seen taking into account the data dependencies to other statements in execution and data flows, in addition to the statement by itself. Finally, the vector representations for code coverage matrix, data dependencies among statements, and source code are combined and used as the input of a classifier built from a Convolution Neural Network to detect buggy statements/methods. Our empirical evaluation shows that DeepRL4FL improves the top-1 results over the state-of-the-art statement-level FL baselines from 173.1% to 491.7%. It also improves the top-1 results over the existing method-level FL baselines from 15.0% to 206.3%.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于代码覆盖表示学习的故障定位
在本文中,我们提出了DeepRL4FL,这是一种深度学习错误定位(FL)方法,通过将错误定位视为图像模式识别问题,在语句和方法级别定位错误代码。DeepRL4FL通过新颖的代码覆盖表示学习(RL)和程序语句的数据依赖关系RL来实现这一点。这两种基于代码覆盖矩阵中动态信息的强化学习也与基于通常可疑源代码的静态信息的代码表示学习相结合。这种组合受到犯罪现场调查的启发,在犯罪现场调查中,调查人员分析犯罪现场(失败的测试用例和语句)和相关人员(具有依赖性的语句),同时检查过去犯下类似罪行的通常嫌疑人(训练数据中类似的错误代码)。对于代码覆盖信息,DeepRL4FL首先对测试用例进行排序,并标记显示错误的代码语句,期望模型能够识别出区分错误和非错误语句/方法的模式。对于语句之间的依赖性,除了语句本身之外,还考虑到语句在执行过程中对其他语句和数据流的数据依赖性。最后,将代码覆盖率矩阵、语句之间的数据依赖关系和源代码的向量表示组合起来,并用作由卷积神经网络构建的分类器的输入,以检测有缺陷的语句/方法。我们的实证评估表明,DeepRL4FL将最先进的语句级FL基线的前1名结果从173.1%提高到491.7%。它还将现有方法级FL基线的前1名结果从15.0%提高到206.3%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
MuDelta: Delta-Oriented Mutation Testing at Commit Time Verifying Determinism in Sequential Programs Data-Oriented Differential Testing of Object-Relational Mapping Systems IoT Bugs and Development Challenges Onboarding vs. Diversity, Productivity and Quality — Empirical Study of the OpenStack Ecosystem
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1