A Mutual Information-Based Hybrid Feature Selection Method for Software Cost Estimation Using Feature Clustering

2014 IEEE 38th Annual Computer Software and Applications Conference Pub Date : 2014-07-21 DOI:10.1109/COMPSAC.2014.99

Qin Liu, Shihai Shi, Hongming Zhu, Jiakai Xiao

{"title":"A Mutual Information-Based Hybrid Feature Selection Method for Software Cost Estimation Using Feature Clustering","authors":"Qin Liu, Shihai Shi, Hongming Zhu, Jiakai Xiao","doi":"10.1109/COMPSAC.2014.99","DOIUrl":null,"url":null,"abstract":"Feature selection methods are designed to obtain the optimal feature subset from the original features to give the most accurate prediction. So far, supervised and unsupervised feature selection methods have been discussed and developed separately. However, these two methods can be combined together as a hybrid feature selection method for some data sets. In this paper, we propose a mutual information-based (MI-based) hybrid feature selection method using feature clustering. In the unsupervised learning stage, the original features are grouped into several clusters based on the feature similarity to each other with agglomerative hierarchical clustering. Then in the supervised learning stage, the feature in each cluster that can maximize the feature similarity with the response feature which represents the class label is selected as the representative feature. These representative features compose the feature subset. Our contribution includes 1)the newly proposed feature selection method and 2)the application of feature clustering for software cost estimation. The proposed method employs wrapper approaches, so it can evaluate the prediction performance of each feature subset to determine the optimal one. The experimental results in software cost estimation demonstrate that the proposed method can outperform at least 11.5% and 14.8% than the supervised feature selection method INMIFS and mRMRFS in ISBSG R8 and Desharnais data set in terms of PRED (0.25) value.","PeriodicalId":106871,"journal":{"name":"2014 IEEE 38th Annual Computer Software and Applications Conference","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 38th Annual Computer Software and Applications Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMPSAC.2014.99","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

Abstract

Feature selection methods are designed to obtain the optimal feature subset from the original features to give the most accurate prediction. So far, supervised and unsupervised feature selection methods have been discussed and developed separately. However, these two methods can be combined together as a hybrid feature selection method for some data sets. In this paper, we propose a mutual information-based (MI-based) hybrid feature selection method using feature clustering. In the unsupervised learning stage, the original features are grouped into several clusters based on the feature similarity to each other with agglomerative hierarchical clustering. Then in the supervised learning stage, the feature in each cluster that can maximize the feature similarity with the response feature which represents the class label is selected as the representative feature. These representative features compose the feature subset. Our contribution includes 1)the newly proposed feature selection method and 2)the application of feature clustering for software cost estimation. The proposed method employs wrapper approaches, so it can evaluate the prediction performance of each feature subset to determine the optimal one. The experimental results in software cost estimation demonstrate that the proposed method can outperform at least 11.5% and 14.8% than the supervised feature selection method INMIFS and mRMRFS in ISBSG R8 and Desharnais data set in terms of PRED (0.25) value.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于特征聚类的基于互信息的混合特征选择方法

特征选择方法旨在从原始特征中获得最优的特征子集，从而给出最准确的预测。到目前为止，有监督和无监督的特征选择方法是分开讨论和发展的。然而，对于某些数据集，这两种方法可以结合起来作为混合特征选择方法。本文提出了一种基于互信息(MI-based)的混合特征选择方法。在无监督学习阶段，采用聚类分层聚类的方法，根据特征之间的相似性将原始特征分组成若干个聚类。然后在监督学习阶段，在每个聚类中选择与代表类标号的响应特征相似度最大的特征作为代表特征。这些代表性的特征组成了特征子集。我们的贡献包括1)新提出的特征选择方法和2)特征聚类在软件成本估计中的应用。该方法采用包装器方法，可以对每个特征子集的预测性能进行评估，从而确定最优特征子集。软件成本估计的实验结果表明，该方法在ISBSG R8和Desharnais数据集上的PRED(0.25)值比有监督特征选择方法INMIFS和mRMRFS分别高出11.5%和14.8%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2014 IEEE 38th Annual Computer Software and Applications Conference

自引率

0.00%

发文量