A Comparative Study on the Stability of Software Metric Selection Techniques

Huanjing Wang, T. Khoshgoftaar, Randall Wald, Amri Napolitano
{"title":"A Comparative Study on the Stability of Software Metric Selection Techniques","authors":"Huanjing Wang, T. Khoshgoftaar, Randall Wald, Amri Napolitano","doi":"10.1109/ICMLA.2012.142","DOIUrl":null,"url":null,"abstract":"In large software projects, software quality prediction is an important aspect of the development cycle to help focus quality assurance efforts on the modules most likely to contain faults. To perform software quality prediction, various software metrics are collected during the software development cycle, and models are built using these metrics. However, not all features (metrics) make the same contribution to the class attribute (e.g., faulty/not faulty). Thus, selecting a subset of metrics that are relevant to the class attribute is a critical step. As many feature selection algorithms exist, it is important to find ones which will produce consistent results even as the underlying data is changed, this quality of producing consistent results is referred to as \"stability.\" In this paper, we investigate the stability of seven feature selection techniques in the context of software quality classification. We compare four approaches for varying the underlying data to evaluate stability: the traditional approach of generating many sub samples of the original data and comparing the features selected from each, an earlier approach developed by our research group which compares the features selected from sub samples of the data with those selected from the original, and two newly-proposed approaches based on comparing two sub samples which are specifically designed to have same number of instances and a specified level of overlap, with one of these new approaches comparing within each pair while the other compares the generated sub samples with the original dataset. The empirical validation is carried out on sixteen software metrics datasets. Our results show that ReliefF is the most stable feature selection technique. Results also show that the level of overlap, degree of perturbation, and feature subset size do affect the stability of feature selection methods. Finally, we find that all four approaches of evaluating stability produce similar results in terms of which feature selection techniques are best under different circumstances.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 11th International Conference on Machine Learning and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2012.142","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

In large software projects, software quality prediction is an important aspect of the development cycle to help focus quality assurance efforts on the modules most likely to contain faults. To perform software quality prediction, various software metrics are collected during the software development cycle, and models are built using these metrics. However, not all features (metrics) make the same contribution to the class attribute (e.g., faulty/not faulty). Thus, selecting a subset of metrics that are relevant to the class attribute is a critical step. As many feature selection algorithms exist, it is important to find ones which will produce consistent results even as the underlying data is changed, this quality of producing consistent results is referred to as "stability." In this paper, we investigate the stability of seven feature selection techniques in the context of software quality classification. We compare four approaches for varying the underlying data to evaluate stability: the traditional approach of generating many sub samples of the original data and comparing the features selected from each, an earlier approach developed by our research group which compares the features selected from sub samples of the data with those selected from the original, and two newly-proposed approaches based on comparing two sub samples which are specifically designed to have same number of instances and a specified level of overlap, with one of these new approaches comparing within each pair while the other compares the generated sub samples with the original dataset. The empirical validation is carried out on sixteen software metrics datasets. Our results show that ReliefF is the most stable feature selection technique. Results also show that the level of overlap, degree of perturbation, and feature subset size do affect the stability of feature selection methods. Finally, we find that all four approaches of evaluating stability produce similar results in terms of which feature selection techniques are best under different circumstances.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
软件度量选择技术稳定性的比较研究
在大型软件项目中,软件质量预测是开发周期的一个重要方面,它有助于将质量保证工作集中在最有可能包含错误的模块上。为了执行软件质量预测,在软件开发周期中收集各种软件度量标准,并使用这些度量标准构建模型。然而,并不是所有的特性(度量)对类属性做出相同的贡献(例如,有缺陷/没有缺陷)。因此,选择与类属性相关的度量子集是关键的一步。由于存在许多特征选择算法,因此即使底层数据发生变化,也要找到能够产生一致结果的算法,这一点很重要,这种产生一致结果的质量被称为“稳定性”。本文研究了软件质量分类中7种特征选择技术的稳定性。我们比较了四种改变底层数据来评估稳定性的方法:传统的方法是生成原始数据的许多子样本并比较从每个子样本中选择的特征,本课题组开发的较早的方法是将从数据的子样本中选择的特征与从原始数据中选择的特征进行比较,以及两种新提出的基于比较两个子样本的方法,这两个子样本专门设计为具有相同数量的实例和指定的重叠程度。其中一种新方法在每对中进行比较,而另一种方法将生成的子样本与原始数据集进行比较。在16个软件度量数据集上进行了实证验证。结果表明,ReliefF是最稳定的特征选择技术。结果还表明,重叠程度、扰动程度和特征子集大小会影响特征选择方法的稳定性。最后,我们发现,就特征选择技术在不同情况下的最佳效果而言,所有四种评估稳定性的方法产生了相似的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Excitation Current Forecasting for Reactive Power Compensation in Synchronous Motors: A Data Mining Approach Deep Structure Learning: Beyond Connectionist Approaches Using Twitter Content to Predict Psychopathy A Hybrid Approach to Coping with High Dimensionality and Class Imbalance for Software Defect Prediction O-linked Glycosylation Site Prediction Using Ensemble of Graphical Models
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1