A Mutual Information-Based Hybrid Feature Selection Method for Software Cost Estimation Using Feature Clustering

Qin Liu, Shihai Shi, Hongming Zhu, Jiakai Xiao
{"title":"A Mutual Information-Based Hybrid Feature Selection Method for Software Cost Estimation Using Feature Clustering","authors":"Qin Liu, Shihai Shi, Hongming Zhu, Jiakai Xiao","doi":"10.1109/COMPSAC.2014.99","DOIUrl":null,"url":null,"abstract":"Feature selection methods are designed to obtain the optimal feature subset from the original features to give the most accurate prediction. So far, supervised and unsupervised feature selection methods have been discussed and developed separately. However, these two methods can be combined together as a hybrid feature selection method for some data sets. In this paper, we propose a mutual information-based (MI-based) hybrid feature selection method using feature clustering. In the unsupervised learning stage, the original features are grouped into several clusters based on the feature similarity to each other with agglomerative hierarchical clustering. Then in the supervised learning stage, the feature in each cluster that can maximize the feature similarity with the response feature which represents the class label is selected as the representative feature. These representative features compose the feature subset. Our contribution includes 1)the newly proposed feature selection method and 2)the application of feature clustering for software cost estimation. The proposed method employs wrapper approaches, so it can evaluate the prediction performance of each feature subset to determine the optimal one. The experimental results in software cost estimation demonstrate that the proposed method can outperform at least 11.5% and 14.8% than the supervised feature selection method INMIFS and mRMRFS in ISBSG R8 and Desharnais data set in terms of PRED (0.25) value.","PeriodicalId":106871,"journal":{"name":"2014 IEEE 38th Annual Computer Software and Applications Conference","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 38th Annual Computer Software and Applications Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMPSAC.2014.99","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

Feature selection methods are designed to obtain the optimal feature subset from the original features to give the most accurate prediction. So far, supervised and unsupervised feature selection methods have been discussed and developed separately. However, these two methods can be combined together as a hybrid feature selection method for some data sets. In this paper, we propose a mutual information-based (MI-based) hybrid feature selection method using feature clustering. In the unsupervised learning stage, the original features are grouped into several clusters based on the feature similarity to each other with agglomerative hierarchical clustering. Then in the supervised learning stage, the feature in each cluster that can maximize the feature similarity with the response feature which represents the class label is selected as the representative feature. These representative features compose the feature subset. Our contribution includes 1)the newly proposed feature selection method and 2)the application of feature clustering for software cost estimation. The proposed method employs wrapper approaches, so it can evaluate the prediction performance of each feature subset to determine the optimal one. The experimental results in software cost estimation demonstrate that the proposed method can outperform at least 11.5% and 14.8% than the supervised feature selection method INMIFS and mRMRFS in ISBSG R8 and Desharnais data set in terms of PRED (0.25) value.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于特征聚类的基于互信息的混合特征选择方法
特征选择方法旨在从原始特征中获得最优的特征子集,从而给出最准确的预测。到目前为止,有监督和无监督的特征选择方法是分开讨论和发展的。然而,对于某些数据集,这两种方法可以结合起来作为混合特征选择方法。本文提出了一种基于互信息(MI-based)的混合特征选择方法。在无监督学习阶段,采用聚类分层聚类的方法,根据特征之间的相似性将原始特征分组成若干个聚类。然后在监督学习阶段,在每个聚类中选择与代表类标号的响应特征相似度最大的特征作为代表特征。这些代表性的特征组成了特征子集。我们的贡献包括1)新提出的特征选择方法和2)特征聚类在软件成本估计中的应用。该方法采用包装器方法,可以对每个特征子集的预测性能进行评估,从而确定最优特征子集。软件成本估计的实验结果表明,该方法在ISBSG R8和Desharnais数据集上的PRED(0.25)值比有监督特征选择方法INMIFS和mRMRFS分别高出11.5%和14.8%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Power-Saving Mechanism for IEEE 802.11 Clients in a Multicast Multimedia Streaming Network Empirically Based Evolution of a Variability Management Approach at UML Class Level CrowdAdaptor: A Crowd Sourcing Approach toward Adaptive Energy-Efficient Configurations of Virtual Machines Hosting Mobile Applications A Distributed Topic-Based Pub/Sub Method for Exhaust Data Streams towards Scalable Event-Driven Systems Trimming Test Suites with Coincidentally Correct Test Cases for Enhancing Fault Localizations
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1