Predicting Fault-Prone Software Modules with Rank Sum Classification

J. Cahill, J. Hogan, Richard N. Thomas
{"title":"Predicting Fault-Prone Software Modules with Rank Sum Classification","authors":"J. Cahill, J. Hogan, Richard N. Thomas","doi":"10.1109/ASWEC.2013.33","DOIUrl":null,"url":null,"abstract":"The detection and correction of defects remains among the most time consuming and expensive aspects of software development. Extensive automated testing and code inspections may mitigate their effect, but some code fragments are necessarily more likely to be faulty than others, and automated identification of fault prone modules helps to focus testing and inspections, thus limiting wasted effort and potentially improving detection rates. However, software metrics data is often extremely noisy, with enormous imbalances in the size of the positive and negative classes. In this work, we present a new approach to predictive modelling of fault proneness in software modules, introducing a new feature representation to overcome some of these issues. This rank sum representation offers improved or at worst comparable performance to earlier approaches for standard data sets, and readily allows the user to choose an appropriate trade-off between precision and recall to optimise inspection effort to suit different testing environments. The method is evaluated using the NASA Metrics Data Program (MDP) data sets, and performance is compared with existing studies based on the Support Vector Machine (SVM) and Naïve Bayes (NB) Classifiers, and with our own comprehensive evaluation of these methods.","PeriodicalId":394020,"journal":{"name":"2013 22nd Australian Software Engineering Conference","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 22nd Australian Software Engineering Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASWEC.2013.33","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 23

Abstract

The detection and correction of defects remains among the most time consuming and expensive aspects of software development. Extensive automated testing and code inspections may mitigate their effect, but some code fragments are necessarily more likely to be faulty than others, and automated identification of fault prone modules helps to focus testing and inspections, thus limiting wasted effort and potentially improving detection rates. However, software metrics data is often extremely noisy, with enormous imbalances in the size of the positive and negative classes. In this work, we present a new approach to predictive modelling of fault proneness in software modules, introducing a new feature representation to overcome some of these issues. This rank sum representation offers improved or at worst comparable performance to earlier approaches for standard data sets, and readily allows the user to choose an appropriate trade-off between precision and recall to optimise inspection effort to suit different testing environments. The method is evaluated using the NASA Metrics Data Program (MDP) data sets, and performance is compared with existing studies based on the Support Vector Machine (SVM) and Naïve Bayes (NB) Classifiers, and with our own comprehensive evaluation of these methods.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于秩和分类的易故障软件模块预测
缺陷的检测和纠正仍然是软件开发中最耗时和最昂贵的方面之一。广泛的自动化测试和代码检查可能会减轻它们的影响,但是一些代码片段必然比其他代码片段更有可能出错,而容易出错模块的自动识别有助于集中测试和检查,从而限制了浪费的努力,并潜在地提高了检测率。然而,软件度量数据通常非常嘈杂,在正类和负类的大小上存在巨大的不平衡。在这项工作中,我们提出了一种新的软件模块故障倾向预测建模方法,引入了一种新的特征表示来克服这些问题。这种秩和表示为标准数据集提供了改进的或最坏的可比较性能,并且很容易允许用户在精度和召回率之间选择适当的权衡,以优化检查工作,以适应不同的测试环境。使用NASA Metrics Data Program (MDP)数据集对该方法进行了评估,并将其性能与基于支持向量机(SVM)和Naïve贝叶斯(NB)分类器的现有研究进行了比较,并对这些方法进行了我们自己的综合评估。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Decomposing Distributed Software Architectures for the Determination and Incorporation of Security and Other Non-functional Requirements What Can Developers' Messages Tell Us? A Psycholinguistic Analysis of Jazz Teams' Attitudes and Behavior Patterns On the Semantics of Scenario-Based Specification Based on Timed Computational Tree Logic Unifying Configuration Management with Merge Conflict Detection and Awareness Systems A Method of Specifying and Classifying Requirements Change
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1