{"title":"研究了深度主动学习在软件缺陷预测中的有效性","authors":"Farid Feyzi, Arman Daneshdoost","doi":"10.1080/1206212X.2023.2252117","DOIUrl":null,"url":null,"abstract":"Accurate prediction of defective software modules is of great importance for prioritizing quality assurance efforts, reasonably allocating testing resources, reducing costs and improving software quality. Several studies have used machine learning to predict software defects. However, complex structures and imbalanced class distributions in software defect data make learning an effective defect prediction model challenging. In this article, two deep learning-based defect prediction models using static code metrics are proposed. In order to enhance the learning process and improve the performance of the proposed models, pool-based active learning is employed. In this regard, the possibility of using active learning to mitigate the need for a large amount of labeled data in the process of building deep learning models is investigated. To deal with imbalanced distribution of software modules between defective and non-defective classes, Near-Miss under-sampling and KNN, with different number of neighbors, are used. The reason for choosing them is their good performance in binary classification problems. Experiments are performed on two well-known, publicly available datasets, GitHub Bug Dataset and public Unified Bug Dataset for java projects. The evaluation results reveal the effectiveness of our proposed models in comparison to the traditional machine learning algorithms. In the conducted investigations on the Unified Bug Dataset, at the file level, the value of F-measure and AUC criteria have improved by 13 and 11 percent, respectively and at the class level, the values have improved by 14 and 11 percent, respectively.","PeriodicalId":39673,"journal":{"name":"International Journal of Computers and Applications","volume":"16 1","pages":"534 - 552"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Studying the effectiveness of deep active learning in software defect prediction\",\"authors\":\"Farid Feyzi, Arman Daneshdoost\",\"doi\":\"10.1080/1206212X.2023.2252117\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Accurate prediction of defective software modules is of great importance for prioritizing quality assurance efforts, reasonably allocating testing resources, reducing costs and improving software quality. Several studies have used machine learning to predict software defects. However, complex structures and imbalanced class distributions in software defect data make learning an effective defect prediction model challenging. In this article, two deep learning-based defect prediction models using static code metrics are proposed. In order to enhance the learning process and improve the performance of the proposed models, pool-based active learning is employed. In this regard, the possibility of using active learning to mitigate the need for a large amount of labeled data in the process of building deep learning models is investigated. To deal with imbalanced distribution of software modules between defective and non-defective classes, Near-Miss under-sampling and KNN, with different number of neighbors, are used. The reason for choosing them is their good performance in binary classification problems. Experiments are performed on two well-known, publicly available datasets, GitHub Bug Dataset and public Unified Bug Dataset for java projects. The evaluation results reveal the effectiveness of our proposed models in comparison to the traditional machine learning algorithms. In the conducted investigations on the Unified Bug Dataset, at the file level, the value of F-measure and AUC criteria have improved by 13 and 11 percent, respectively and at the class level, the values have improved by 14 and 11 percent, respectively.\",\"PeriodicalId\":39673,\"journal\":{\"name\":\"International Journal of Computers and Applications\",\"volume\":\"16 1\",\"pages\":\"534 - 552\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-08-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Computers and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1080/1206212X.2023.2252117\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computers and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/1206212X.2023.2252117","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Computer Science","Score":null,"Total":0}
Studying the effectiveness of deep active learning in software defect prediction
Accurate prediction of defective software modules is of great importance for prioritizing quality assurance efforts, reasonably allocating testing resources, reducing costs and improving software quality. Several studies have used machine learning to predict software defects. However, complex structures and imbalanced class distributions in software defect data make learning an effective defect prediction model challenging. In this article, two deep learning-based defect prediction models using static code metrics are proposed. In order to enhance the learning process and improve the performance of the proposed models, pool-based active learning is employed. In this regard, the possibility of using active learning to mitigate the need for a large amount of labeled data in the process of building deep learning models is investigated. To deal with imbalanced distribution of software modules between defective and non-defective classes, Near-Miss under-sampling and KNN, with different number of neighbors, are used. The reason for choosing them is their good performance in binary classification problems. Experiments are performed on two well-known, publicly available datasets, GitHub Bug Dataset and public Unified Bug Dataset for java projects. The evaluation results reveal the effectiveness of our proposed models in comparison to the traditional machine learning algorithms. In the conducted investigations on the Unified Bug Dataset, at the file level, the value of F-measure and AUC criteria have improved by 13 and 11 percent, respectively and at the class level, the values have improved by 14 and 11 percent, respectively.
期刊介绍:
The International Journal of Computers and Applications (IJCA) is a unique platform for publishing novel ideas, research outcomes and fundamental advances in all aspects of Computer Science, Computer Engineering, and Computer Applications. This is a peer-reviewed international journal with a vision to provide the academic and industrial community a platform for presenting original research ideas and applications. IJCA welcomes four special types of papers in addition to the regular research papers within its scope: (a) Papers for which all results could be easily reproducible. For such papers, the authors will be asked to upload "instructions for reproduction'''', possibly with the source codes or stable URLs (from where the codes could be downloaded). (b) Papers with negative results. For such papers, the experimental setting and negative results must be presented in detail. Also, why the negative results are important for the research community must be explained clearly. The rationale behind this kind of paper is that this would help researchers choose the correct approaches to solve problems and avoid the (already worked out) failed approaches. (c) Detailed report, case study and literature review articles about innovative software / hardware, new technology, high impact computer applications and future development with sufficient background and subject coverage. (d) Special issue papers focussing on a particular theme with significant importance or papers selected from a relevant conference with sufficient improvement and new material to differentiate from the papers published in a conference proceedings.