基于邻域熵的特征选择

J. Univers. Comput. Sci. Pub Date : 2022-11-28 DOI:10.3897/jucs.79905

Fatemeh Farnaghi-Zadeh, Mohsen Rahmani, Maryam Amiri

{"title":"基于邻域熵的特征选择","authors":"Fatemeh Farnaghi-Zadeh, Mohsen Rahmani, Maryam Amiri","doi":"10.3897/jucs.79905","DOIUrl":null,"url":null,"abstract":"Feature selection plays an important role as a preprocessing step for pattern recognition and machine learning. The goal of feature selection is to determine an optimal subset of relevant features out of a large number of features. The neighborhood discrimination index (NDI) is one of the newest and the most efficient measures to determine distinguishing ability of a feature subset. NDI is computed based on a neighborhood radius (E). Due to the significant impact of E on NDI, selecting an appropriate value of E for each data set might be challenging and very time-consuming. This paper proposes a new approach based on targEt PointS To computE neIgh- borhood relatioNs (EPSTEIN). At first, all the data points are sorted in the descending order of their density. Then, the highest density data points are selected as many as the number of classes. To determine the neighborhood relations, the circles centered on the target points are drawn and the points inside or on the circles are considered to be neighbors. In the next step, the significance of each feature is computed and a greedy algorithm selects appropriate features. The performance of the proposed approach is compared to both the commonest and newest methods of feature selection. The experimental results show that EPSTEIN could select more efficient subsets of features and improve the prediction accuracy of classifiers in comparison to the other state-of-the-art methods such as Correlation-based Feature Selection (CFS), Fast Correlation-Based Filter (FCBF), Heuris- tic Algorithm Based on Neighborhood Discrimination Index (HANDI), Ranking Based Feature Inclusion for Optimal Feature Subset (KNFI), Ranking Based Feature Elimination (KNFE) and Principal Component Analysis and Information Gain (PCA-IG).","PeriodicalId":14652,"journal":{"name":"J. Univers. Comput. Sci.","volume":"82 1","pages":"1169-1192"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Feature Selection Using Neighborhood based Entropy\",\"authors\":\"Fatemeh Farnaghi-Zadeh, Mohsen Rahmani, Maryam Amiri\",\"doi\":\"10.3897/jucs.79905\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Feature selection plays an important role as a preprocessing step for pattern recognition and machine learning. The goal of feature selection is to determine an optimal subset of relevant features out of a large number of features. The neighborhood discrimination index (NDI) is one of the newest and the most efficient measures to determine distinguishing ability of a feature subset. NDI is computed based on a neighborhood radius (E). Due to the significant impact of E on NDI, selecting an appropriate value of E for each data set might be challenging and very time-consuming. This paper proposes a new approach based on targEt PointS To computE neIgh- borhood relatioNs (EPSTEIN). At first, all the data points are sorted in the descending order of their density. Then, the highest density data points are selected as many as the number of classes. To determine the neighborhood relations, the circles centered on the target points are drawn and the points inside or on the circles are considered to be neighbors. In the next step, the significance of each feature is computed and a greedy algorithm selects appropriate features. The performance of the proposed approach is compared to both the commonest and newest methods of feature selection. The experimental results show that EPSTEIN could select more efficient subsets of features and improve the prediction accuracy of classifiers in comparison to the other state-of-the-art methods such as Correlation-based Feature Selection (CFS), Fast Correlation-Based Filter (FCBF), Heuris- tic Algorithm Based on Neighborhood Discrimination Index (HANDI), Ranking Based Feature Inclusion for Optimal Feature Subset (KNFI), Ranking Based Feature Elimination (KNFE) and Principal Component Analysis and Information Gain (PCA-IG).\",\"PeriodicalId\":14652,\"journal\":{\"name\":\"J. Univers. Comput. Sci.\",\"volume\":\"82 1\",\"pages\":\"1169-1192\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"J. Univers. Comput. Sci.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3897/jucs.79905\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Univers. Comput. Sci.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3897/jucs.79905","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

特征选择作为模式识别和机器学习的预处理步骤起着重要的作用。特征选择的目标是从大量的特征中确定一个最优的相关特征子集。邻域识别指数(NDI)是衡量特征子集区分能力的最新、最有效的方法之一。NDI是基于邻域半径(E)计算的。由于E对NDI的影响很大，因此为每个数据集选择合适的E值可能是具有挑战性的，并且非常耗时。本文提出了一种基于目标点的邻域关系计算方法(EPSTEIN)。首先，所有的数据点按照它们的密度降序进行排序。然后，选择密度最高的数据点与类的数量相同。为了确定邻域关系，绘制以目标点为中心的圆，将圆内或圆上的点视为邻域。接下来，计算每个特征的重要性，并使用贪婪算法选择合适的特征。将该方法的性能与最常用和最新的特征选择方法进行了比较。实验结果表明，与基于关联的特征选择(CFS)、基于快速关联的滤波(FCBF)、基于邻域判别指数的启发式算法(HANDI)、基于最优特征子集的排序特征包含(KNFI)、基于排名的特征消除(KNFE)和主成分分析与信息增益(PCA-IG)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Feature Selection Using Neighborhood based Entropy

Feature selection plays an important role as a preprocessing step for pattern recognition and machine learning. The goal of feature selection is to determine an optimal subset of relevant features out of a large number of features. The neighborhood discrimination index (NDI) is one of the newest and the most efficient measures to determine distinguishing ability of a feature subset. NDI is computed based on a neighborhood radius (E). Due to the significant impact of E on NDI, selecting an appropriate value of E for each data set might be challenging and very time-consuming. This paper proposes a new approach based on targEt PointS To computE neIgh- borhood relatioNs (EPSTEIN). At first, all the data points are sorted in the descending order of their density. Then, the highest density data points are selected as many as the number of classes. To determine the neighborhood relations, the circles centered on the target points are drawn and the points inside or on the circles are considered to be neighbors. In the next step, the significance of each feature is computed and a greedy algorithm selects appropriate features. The performance of the proposed approach is compared to both the commonest and newest methods of feature selection. The experimental results show that EPSTEIN could select more efficient subsets of features and improve the prediction accuracy of classifiers in comparison to the other state-of-the-art methods such as Correlation-based Feature Selection (CFS), Fast Correlation-Based Filter (FCBF), Heuris- tic Algorithm Based on Neighborhood Discrimination Index (HANDI), Ranking Based Feature Inclusion for Optimal Feature Subset (KNFI), Ranking Based Feature Elimination (KNFE) and Principal Component Analysis and Information Gain (PCA-IG).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

J. Univers. Comput. Sci.

自引率

0.00%

发文量