软件缺陷预测的非线性几何框架

IF 0.7 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS International Journal of Decision Support System Technology Pub Date : 2020-07-01 DOI:10.4018/ijdsst.2020070105

Misha Kakkar, Sarika Jain, Abhay Bansal, P. Grover

{"title":"软件缺陷预测的非线性几何框架","authors":"Misha Kakkar, Sarika Jain, Abhay Bansal, P. Grover","doi":"10.4018/ijdsst.2020070105","DOIUrl":null,"url":null,"abstract":"Humans use the software in every walk of life thus it is essential to have the best quality software. Software defect prediction models assist in identifying defect prone modules with the help of historical data, which in turn improves software quality. Historical data consists of data related to modules /files/classes which are labeled as buggy or clean. As the number of buggy artifacts as less as compared to clean artifacts, the nature of historical data becomes imbalance. Due to this uneven distribution of the data, it difficult for classification algorithms to build highly effective SDP models. The objective of this study is to propose a new nonlinear geometric framework based on SMOTE and ensemble learning to improve the performance of SDP models. The study combines the traditional SMOTE algorithm and the novel ensemble Support Vector Machine (SVM) is used to develop the proposed framework called SMEnsemble. SMOTE algorithm handles the class imbalance problem by generating synthetic instances of the minority class. Ensemble learning generates multiple classification models to select the best performing SDP model. For experimentation, datasets from three different software repositories that contain both open source as well as proprietary projects are used in the study. The results show that SMEnsemble performs better than traditional methods for identifying the minority class i.e. buggy artifacts. Also, the proposed model performance is better than the latest state of Art SDP model- SMOTUNED. The proposed model is capable of handling imbalance classes when compared with traditional methods. Also, by carefully selecting the number of ensembles high performance can be achieved in less time.","PeriodicalId":42414,"journal":{"name":"International Journal of Decision Support System Technology","volume":"1 1","pages":"85-100"},"PeriodicalIF":0.7000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Nonlinear Geometric Framework for Software Defect Prediction\",\"authors\":\"Misha Kakkar, Sarika Jain, Abhay Bansal, P. Grover\",\"doi\":\"10.4018/ijdsst.2020070105\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Humans use the software in every walk of life thus it is essential to have the best quality software. Software defect prediction models assist in identifying defect prone modules with the help of historical data, which in turn improves software quality. Historical data consists of data related to modules /files/classes which are labeled as buggy or clean. As the number of buggy artifacts as less as compared to clean artifacts, the nature of historical data becomes imbalance. Due to this uneven distribution of the data, it difficult for classification algorithms to build highly effective SDP models. The objective of this study is to propose a new nonlinear geometric framework based on SMOTE and ensemble learning to improve the performance of SDP models. The study combines the traditional SMOTE algorithm and the novel ensemble Support Vector Machine (SVM) is used to develop the proposed framework called SMEnsemble. SMOTE algorithm handles the class imbalance problem by generating synthetic instances of the minority class. Ensemble learning generates multiple classification models to select the best performing SDP model. For experimentation, datasets from three different software repositories that contain both open source as well as proprietary projects are used in the study. The results show that SMEnsemble performs better than traditional methods for identifying the minority class i.e. buggy artifacts. Also, the proposed model performance is better than the latest state of Art SDP model- SMOTUNED. The proposed model is capable of handling imbalance classes when compared with traditional methods. Also, by carefully selecting the number of ensembles high performance can be achieved in less time.\",\"PeriodicalId\":42414,\"journal\":{\"name\":\"International Journal of Decision Support System Technology\",\"volume\":\"1 1\",\"pages\":\"85-100\"},\"PeriodicalIF\":0.7000,\"publicationDate\":\"2020-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Decision Support System Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4018/ijdsst.2020070105\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Decision Support System Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4018/ijdsst.2020070105","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

人们在生活的各个方面都使用软件，因此拥有最好质量的软件是必不可少的。软件缺陷预测模型借助历史数据帮助识别容易出现缺陷的模块，这反过来又提高了软件质量。历史数据由与模块/文件/类相关的数据组成，这些数据被标记为错误或干净。由于与干净的工件相比，错误工件的数量更少，因此历史数据的性质变得不平衡。由于数据的这种不均匀分布，使得分类算法难以建立高效的SDP模型。本研究的目的是提出一种新的基于SMOTE和集成学习的非线性几何框架，以提高SDP模型的性能。该研究将传统的SMOTE算法与新颖的集成支持向量机(SVM)相结合，开发了SMEnsemble框架。SMOTE算法通过生成少数类的合成实例来处理类不平衡问题。集成学习生成多个分类模型，以选择性能最好的SDP模型。为了进行实验，研究中使用了来自三个不同软件存储库的数据集，这些存储库既包含开源项目，也包含专有项目。结果表明，SMEnsemble在识别少数类(即有bug的工件)方面比传统方法表现得更好。同时，该模型的性能优于当前最先进的SDP模型SMOTUNED。与传统方法相比，该模型具有处理不平衡类的能力。此外，通过仔细选择合奏的数量，可以在更短的时间内实现高性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Nonlinear Geometric Framework for Software Defect Prediction

Humans use the software in every walk of life thus it is essential to have the best quality software. Software defect prediction models assist in identifying defect prone modules with the help of historical data, which in turn improves software quality. Historical data consists of data related to modules /files/classes which are labeled as buggy or clean. As the number of buggy artifacts as less as compared to clean artifacts, the nature of historical data becomes imbalance. Due to this uneven distribution of the data, it difficult for classification algorithms to build highly effective SDP models. The objective of this study is to propose a new nonlinear geometric framework based on SMOTE and ensemble learning to improve the performance of SDP models. The study combines the traditional SMOTE algorithm and the novel ensemble Support Vector Machine (SVM) is used to develop the proposed framework called SMEnsemble. SMOTE algorithm handles the class imbalance problem by generating synthetic instances of the minority class. Ensemble learning generates multiple classification models to select the best performing SDP model. For experimentation, datasets from three different software repositories that contain both open source as well as proprietary projects are used in the study. The results show that SMEnsemble performs better than traditional methods for identifying the minority class i.e. buggy artifacts. Also, the proposed model performance is better than the latest state of Art SDP model- SMOTUNED. The proposed model is capable of handling imbalance classes when compared with traditional methods. Also, by carefully selecting the number of ensembles high performance can be achieved in less time.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Decision Support System Technology COMPUTER SCIENCE, INFORMATION SYSTEMS-

CiteScore

2.20

自引率

18.20%

发文量