LineFlowDP: A Deep Learning-Based Two-Phase Approach for Line-Level Defect Prediction

IF 3.5 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Empirical Software Engineering Pub Date : 2024-02-23 DOI:10.1007/s10664-023-10439-z

Fengyu Yang, Fa Zhong, Guangdong Zeng, Peng Xiao, Wei Zheng

{"title":"LineFlowDP: A Deep Learning-Based Two-Phase Approach for Line-Level Defect Prediction","authors":"Fengyu Yang, Fa Zhong, Guangdong Zeng, Peng Xiao, Wei Zheng","doi":"10.1007/s10664-023-10439-z","DOIUrl":null,"url":null,"abstract":"<p>Software defect prediction plays a key role in guiding resource allocation for software testing. However, previous defect prediction studies still have some limitations: (1) the granularity of defect prediction is still coarse, so high-risk code statements cannot be accurately located; (2) in fine-grained defect prediction, the semantic and structural information available in a single line of code is limited, and the content of code semantic information is not sufficient to achieve semantic differentiation. To address the above problems, we propose a two-phase line-level defect prediction method based on deep learning called LineFlowDP. We first extract the program dependency graph (PDG) of the source files. The lines of code corresponding to the nodes in the PDG are extended semantically with data flow and control flow information and embedded as nodes, and the model is further trained using an relational graph convolutional network. Finally, a graph interpreter GNNExplainer and a social network analysis method are used to rank the lines of code in the defective file according to risk. On 32 datasets from 9 projects, the experimental results show that LineFlowDP is 13%-404% more cost-effective than four state-of-the-art line-level defect prediction methods. The effectiveness of the flow information extension and code line risk ranking methods was also verified via ablation experiments.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"3 1","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Empirical Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10664-023-10439-z","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Software defect prediction plays a key role in guiding resource allocation for software testing. However, previous defect prediction studies still have some limitations: (1) the granularity of defect prediction is still coarse, so high-risk code statements cannot be accurately located; (2) in fine-grained defect prediction, the semantic and structural information available in a single line of code is limited, and the content of code semantic information is not sufficient to achieve semantic differentiation. To address the above problems, we propose a two-phase line-level defect prediction method based on deep learning called LineFlowDP. We first extract the program dependency graph (PDG) of the source files. The lines of code corresponding to the nodes in the PDG are extended semantically with data flow and control flow information and embedded as nodes, and the model is further trained using an relational graph convolutional network. Finally, a graph interpreter GNNExplainer and a social network analysis method are used to rank the lines of code in the defective file according to risk. On 32 datasets from 9 projects, the experimental results show that LineFlowDP is 13%-404% more cost-effective than four state-of-the-art line-level defect prediction methods. The effectiveness of the flow information extension and code line risk ranking methods was also verified via ablation experiments.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

LineFlowDP：基于深度学习的线路级缺陷预测两阶段方法

软件缺陷预测在指导软件测试资源分配方面发挥着关键作用。然而，以往的缺陷预测研究仍存在一些局限性：（1）缺陷预测的粒度仍然较粗，无法准确定位高风险代码语句；（2）在细粒度缺陷预测中，单行代码中可获得的语义和结构信息有限，代码语义信息内容不足以实现语义区分。针对上述问题，我们提出了一种基于深度学习的两阶段行级缺陷预测方法，称为 LineFlowDP。我们首先提取源文件的程序依赖图（PDG）。然后使用关系图卷积网络进一步训练模型。最后，使用图解释器 GNNExplainer 和社会网络分析方法对缺陷文件中的代码行进行风险排序。在来自 9 个项目的 32 个数据集上，实验结果表明 LineFlowDP 比四种最先进的行级缺陷预测方法的性价比高出 13%-404%。流量信息扩展和代码行风险排序方法的有效性也通过消融实验得到了验证。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Empirical Software Engineering 工程技术-计算机：软件工程

CiteScore

8.50

自引率

12.20%

发文量

169

审稿时长

>12 weeks

期刊介绍： Empirical Software Engineering provides a forum for applied software engineering research with a strong empirical component, and a venue for publishing empirical results relevant to both researchers and practitioners. Empirical studies presented here usually involve the collection and analysis of data and experience that can be used to characterize, evaluate and reveal relationships between software development deliverables, practices, and technologies. Over time, it is expected that such empirical results will form a body of knowledge leading to widely accepted and well-formed theories. The journal also offers industrial experience reports detailing the application of software technologies - processes, methods, or tools - and their effectiveness in industrial settings. Empirical Software Engineering promotes the publication of industry-relevant research, to address the significant gap between research and practice.

期刊最新文献

The effect of data complexity on classifier performance. Reinforcement learning for online testing of autonomous driving systems: a replication and extension study. An empirical study on developers’ shared conversations with ChatGPT in GitHub pull requests and issues Quality issues in machine learning software systems An empirical study of token-based micro commits