{"title":"通过专注于代码理解的神经模型完善软件缺陷预测","authors":"Mona Nashaat , James Miller","doi":"10.1016/j.jss.2024.112266","DOIUrl":null,"url":null,"abstract":"<div><div>Identifying defects through manual software testing is a resource-intensive task in software development. To alleviate this, software defect prediction identifies code segments likely to contain faults using data-driven methods. Traditional techniques rely on static code metrics, which often fail to reflect the deeper syntactic and semantic features of the code. This paper introduces a novel framework that utilizes transformer-based networks with attention mechanisms to predict software defects. The framework encodes input vectors to develop meaningful representations of software modules. A bidirectional transformer encoder is employed to model programming languages, followed by fine-tuning with labeled data to detect defects. The performance of the framework is assessed through experiments across various software projects and compared against baseline techniques. Additionally, statistical hypothesis testing and an ablation study are performed to assess the impact of different parameter choices. The empirical findings indicate that the proposed approach can increase classification accuracy by an average of 15.93% and improve the F1 score by up to 44.26% compared to traditional methods.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"220 ","pages":"Article 112266"},"PeriodicalIF":3.7000,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Refining software defect prediction through attentive neural models for code understanding\",\"authors\":\"Mona Nashaat , James Miller\",\"doi\":\"10.1016/j.jss.2024.112266\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Identifying defects through manual software testing is a resource-intensive task in software development. To alleviate this, software defect prediction identifies code segments likely to contain faults using data-driven methods. Traditional techniques rely on static code metrics, which often fail to reflect the deeper syntactic and semantic features of the code. This paper introduces a novel framework that utilizes transformer-based networks with attention mechanisms to predict software defects. The framework encodes input vectors to develop meaningful representations of software modules. A bidirectional transformer encoder is employed to model programming languages, followed by fine-tuning with labeled data to detect defects. The performance of the framework is assessed through experiments across various software projects and compared against baseline techniques. Additionally, statistical hypothesis testing and an ablation study are performed to assess the impact of different parameter choices. The empirical findings indicate that the proposed approach can increase classification accuracy by an average of 15.93% and improve the F1 score by up to 44.26% compared to traditional methods.</div></div>\",\"PeriodicalId\":51099,\"journal\":{\"name\":\"Journal of Systems and Software\",\"volume\":\"220 \",\"pages\":\"Article 112266\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2024-10-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Systems and Software\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0164121224003108\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems and Software","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0164121224003108","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
摘要
在软件开发过程中,通过人工软件测试识别缺陷是一项资源密集型任务。为了缓解这一问题,软件缺陷预测使用数据驱动方法识别可能包含缺陷的代码段。传统技术依赖于静态代码度量,而静态代码度量往往无法反映代码更深层次的语法和语义特征。本文介绍了一种新颖的框架,它利用基于变压器的网络和注意机制来预测软件缺陷。该框架对输入向量进行编码,以开发软件模块的有意义表征。采用双向变压器编码器对编程语言进行建模,然后利用标注数据进行微调以检测缺陷。通过在各种软件项目中进行实验,对该框架的性能进行了评估,并与基准技术进行了比较。此外,还进行了统计假设检验和消融研究,以评估不同参数选择的影响。实证研究结果表明,与传统方法相比,所提出的方法可将分类准确率平均提高 15.93%,并将 F1 分数最多提高 44.26%。
Refining software defect prediction through attentive neural models for code understanding
Identifying defects through manual software testing is a resource-intensive task in software development. To alleviate this, software defect prediction identifies code segments likely to contain faults using data-driven methods. Traditional techniques rely on static code metrics, which often fail to reflect the deeper syntactic and semantic features of the code. This paper introduces a novel framework that utilizes transformer-based networks with attention mechanisms to predict software defects. The framework encodes input vectors to develop meaningful representations of software modules. A bidirectional transformer encoder is employed to model programming languages, followed by fine-tuning with labeled data to detect defects. The performance of the framework is assessed through experiments across various software projects and compared against baseline techniques. Additionally, statistical hypothesis testing and an ablation study are performed to assess the impact of different parameter choices. The empirical findings indicate that the proposed approach can increase classification accuracy by an average of 15.93% and improve the F1 score by up to 44.26% compared to traditional methods.
期刊介绍:
The Journal of Systems and Software publishes papers covering all aspects of software engineering and related hardware-software-systems issues. All articles should include a validation of the idea presented, e.g. through case studies, experiments, or systematic comparisons with other approaches already in practice. Topics of interest include, but are not limited to:
• Methods and tools for, and empirical studies on, software requirements, design, architecture, verification and validation, maintenance and evolution
• Agile, model-driven, service-oriented, open source and global software development
• Approaches for mobile, multiprocessing, real-time, distributed, cloud-based, dependable and virtualized systems
• Human factors and management concerns of software development
• Data management and big data issues of software systems
• Metrics and evaluation, data mining of software development resources
• Business and economic aspects of software development processes
The journal welcomes state-of-the-art surveys and reports of practical experience for all of these topics.