{"title":"通过图卷积网络捕捉代码上下文进行线路级缺陷预测","authors":"Shouyu Yin;Shikai Guo;Hui Li;Chenchen Li;Rong Chen;Xiaochen Li;He Jiang","doi":"10.1109/TSE.2024.3503723","DOIUrl":null,"url":null,"abstract":"Software defect prediction refers to the systematic analysis and review of software using various approaches and tools to identify potential defects or errors. Software defect prediction aids developers in swiftly identifying defects and optimizing development resource allocation, thus enhancing software quality and reliability. Previous defect prediction approaches still face two main limitations: 1) lacking of contextual semantic information and 2) Ignoring the joint reasoning between different granularities of defect predictions. In response to these challenges, we propose LineDef, a line-level defect prediction approach by capturing code contexts with graph convolutional networks. Specifically, LineDef comprises three components: the token embedding component, the graph extraction component, and the multi-granularity defect prediction component. The token embedding component maps each token to a vector to obtain a high-dimensional semantic feature representation of the token. Subsequently, the graph extraction component utilizes a sliding window to extract line-level and token-level graphs, addressing the challenge of capturing contextual semantic relationships in the code. Finally, the multi-granularity defect prediction component leverages graph convolutional layers and attention mechanisms to acquire prediction labels and risk scores, thereby achieving file-level and line-level defect prediction. Experimental studies on 32 datasets across 9 different software projects show that LineDef exhibits significantly enhanced balanced accuracy, ranging from 15.61% to 45.20%, compared to state-of-the-art file-level defect prediction approaches, and a remarkable cost-effectiveness improvement ranging from 15.32% to 278%, compared to state-of-the-art line-level defect prediction approaches. These results demonstrate that LineDef approach can extract more comprehensive information from lines of code for defect prediction.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 1","pages":"172-191"},"PeriodicalIF":6.5000,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Line-Level Defect Prediction by Capturing Code Contexts With Graph Convolutional Networks\",\"authors\":\"Shouyu Yin;Shikai Guo;Hui Li;Chenchen Li;Rong Chen;Xiaochen Li;He Jiang\",\"doi\":\"10.1109/TSE.2024.3503723\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Software defect prediction refers to the systematic analysis and review of software using various approaches and tools to identify potential defects or errors. Software defect prediction aids developers in swiftly identifying defects and optimizing development resource allocation, thus enhancing software quality and reliability. Previous defect prediction approaches still face two main limitations: 1) lacking of contextual semantic information and 2) Ignoring the joint reasoning between different granularities of defect predictions. In response to these challenges, we propose LineDef, a line-level defect prediction approach by capturing code contexts with graph convolutional networks. Specifically, LineDef comprises three components: the token embedding component, the graph extraction component, and the multi-granularity defect prediction component. The token embedding component maps each token to a vector to obtain a high-dimensional semantic feature representation of the token. Subsequently, the graph extraction component utilizes a sliding window to extract line-level and token-level graphs, addressing the challenge of capturing contextual semantic relationships in the code. Finally, the multi-granularity defect prediction component leverages graph convolutional layers and attention mechanisms to acquire prediction labels and risk scores, thereby achieving file-level and line-level defect prediction. Experimental studies on 32 datasets across 9 different software projects show that LineDef exhibits significantly enhanced balanced accuracy, ranging from 15.61% to 45.20%, compared to state-of-the-art file-level defect prediction approaches, and a remarkable cost-effectiveness improvement ranging from 15.32% to 278%, compared to state-of-the-art line-level defect prediction approaches. These results demonstrate that LineDef approach can extract more comprehensive information from lines of code for defect prediction.\",\"PeriodicalId\":13324,\"journal\":{\"name\":\"IEEE Transactions on Software Engineering\",\"volume\":\"51 1\",\"pages\":\"172-191\"},\"PeriodicalIF\":6.5000,\"publicationDate\":\"2024-11-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Software Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10759072/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10759072/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
Line-Level Defect Prediction by Capturing Code Contexts With Graph Convolutional Networks
Software defect prediction refers to the systematic analysis and review of software using various approaches and tools to identify potential defects or errors. Software defect prediction aids developers in swiftly identifying defects and optimizing development resource allocation, thus enhancing software quality and reliability. Previous defect prediction approaches still face two main limitations: 1) lacking of contextual semantic information and 2) Ignoring the joint reasoning between different granularities of defect predictions. In response to these challenges, we propose LineDef, a line-level defect prediction approach by capturing code contexts with graph convolutional networks. Specifically, LineDef comprises three components: the token embedding component, the graph extraction component, and the multi-granularity defect prediction component. The token embedding component maps each token to a vector to obtain a high-dimensional semantic feature representation of the token. Subsequently, the graph extraction component utilizes a sliding window to extract line-level and token-level graphs, addressing the challenge of capturing contextual semantic relationships in the code. Finally, the multi-granularity defect prediction component leverages graph convolutional layers and attention mechanisms to acquire prediction labels and risk scores, thereby achieving file-level and line-level defect prediction. Experimental studies on 32 datasets across 9 different software projects show that LineDef exhibits significantly enhanced balanced accuracy, ranging from 15.61% to 45.20%, compared to state-of-the-art file-level defect prediction approaches, and a remarkable cost-effectiveness improvement ranging from 15.32% to 278%, compared to state-of-the-art line-level defect prediction approaches. These results demonstrate that LineDef approach can extract more comprehensive information from lines of code for defect prediction.
期刊介绍:
IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include:
a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models.
b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects.
c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards.
d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues.
e) System issues: Hardware-software trade-offs.
f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.