基于邻域熵的特征选择

Fatemeh Farnaghi-Zadeh, Mohsen Rahmani, Maryam Amiri
{"title":"基于邻域熵的特征选择","authors":"Fatemeh Farnaghi-Zadeh, Mohsen Rahmani, Maryam Amiri","doi":"10.3897/jucs.79905","DOIUrl":null,"url":null,"abstract":"Feature selection plays an important role as a preprocessing step for pattern recognition and machine learning. The goal of feature selection is to determine an optimal subset of relevant features out of a large number of features. The neighborhood discrimination index (NDI) is one of the newest and the most efficient measures to determine distinguishing ability of a feature subset. NDI is computed based on a neighborhood radius (E). Due to the significant impact of E on NDI, selecting an appropriate value of E for each data set might be challenging and very time-consuming. This paper proposes a new approach based on targEt PointS To computE neIgh- borhood relatioNs (EPSTEIN). At first, all the data points are sorted in the descending order of their density. Then, the highest density data points are selected as many as the number of classes. To determine the neighborhood relations, the circles centered on the target points are drawn and the points inside or on the circles are considered to be neighbors. In the next step, the significance of each feature is computed and a greedy algorithm selects appropriate features. The performance of the proposed approach is compared to both the commonest and newest methods of feature selection. The experimental results show that EPSTEIN could select more efficient subsets of features and improve the prediction accuracy of classifiers in comparison to the other state-of-the-art methods such as Correlation-based Feature Selection (CFS), Fast Correlation-Based Filter (FCBF), Heuris- tic Algorithm Based on Neighborhood Discrimination Index (HANDI), Ranking Based Feature Inclusion for Optimal Feature Subset (KNFI), Ranking Based Feature Elimination (KNFE) and Principal Component Analysis and Information Gain (PCA-IG).","PeriodicalId":14652,"journal":{"name":"J. Univers. Comput. Sci.","volume":"82 1","pages":"1169-1192"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Feature Selection Using Neighborhood based Entropy\",\"authors\":\"Fatemeh Farnaghi-Zadeh, Mohsen Rahmani, Maryam Amiri\",\"doi\":\"10.3897/jucs.79905\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Feature selection plays an important role as a preprocessing step for pattern recognition and machine learning. The goal of feature selection is to determine an optimal subset of relevant features out of a large number of features. The neighborhood discrimination index (NDI) is one of the newest and the most efficient measures to determine distinguishing ability of a feature subset. NDI is computed based on a neighborhood radius (E). Due to the significant impact of E on NDI, selecting an appropriate value of E for each data set might be challenging and very time-consuming. This paper proposes a new approach based on targEt PointS To computE neIgh- borhood relatioNs (EPSTEIN). At first, all the data points are sorted in the descending order of their density. Then, the highest density data points are selected as many as the number of classes. To determine the neighborhood relations, the circles centered on the target points are drawn and the points inside or on the circles are considered to be neighbors. In the next step, the significance of each feature is computed and a greedy algorithm selects appropriate features. The performance of the proposed approach is compared to both the commonest and newest methods of feature selection. The experimental results show that EPSTEIN could select more efficient subsets of features and improve the prediction accuracy of classifiers in comparison to the other state-of-the-art methods such as Correlation-based Feature Selection (CFS), Fast Correlation-Based Filter (FCBF), Heuris- tic Algorithm Based on Neighborhood Discrimination Index (HANDI), Ranking Based Feature Inclusion for Optimal Feature Subset (KNFI), Ranking Based Feature Elimination (KNFE) and Principal Component Analysis and Information Gain (PCA-IG).\",\"PeriodicalId\":14652,\"journal\":{\"name\":\"J. Univers. Comput. Sci.\",\"volume\":\"82 1\",\"pages\":\"1169-1192\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"J. Univers. Comput. Sci.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3897/jucs.79905\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Univers. Comput. Sci.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3897/jucs.79905","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

特征选择作为模式识别和机器学习的预处理步骤起着重要的作用。特征选择的目标是从大量的特征中确定一个最优的相关特征子集。邻域识别指数(NDI)是衡量特征子集区分能力的最新、最有效的方法之一。NDI是基于邻域半径(E)计算的。由于E对NDI的影响很大,因此为每个数据集选择合适的E值可能是具有挑战性的,并且非常耗时。本文提出了一种基于目标点的邻域关系计算方法(EPSTEIN)。首先,所有的数据点按照它们的密度降序进行排序。然后,选择密度最高的数据点与类的数量相同。为了确定邻域关系,绘制以目标点为中心的圆,将圆内或圆上的点视为邻域。接下来,计算每个特征的重要性,并使用贪婪算法选择合适的特征。将该方法的性能与最常用和最新的特征选择方法进行了比较。实验结果表明,与基于关联的特征选择(CFS)、基于快速关联的滤波(FCBF)、基于邻域判别指数的启发式算法(HANDI)、基于最优特征子集的排序特征包含(KNFI)、基于排名的特征消除(KNFE)和主成分分析与信息增益(PCA-IG)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Feature Selection Using Neighborhood based Entropy
Feature selection plays an important role as a preprocessing step for pattern recognition and machine learning. The goal of feature selection is to determine an optimal subset of relevant features out of a large number of features. The neighborhood discrimination index (NDI) is one of the newest and the most efficient measures to determine distinguishing ability of a feature subset. NDI is computed based on a neighborhood radius (E). Due to the significant impact of E on NDI, selecting an appropriate value of E for each data set might be challenging and very time-consuming. This paper proposes a new approach based on targEt PointS To computE neIgh- borhood relatioNs (EPSTEIN). At first, all the data points are sorted in the descending order of their density. Then, the highest density data points are selected as many as the number of classes. To determine the neighborhood relations, the circles centered on the target points are drawn and the points inside or on the circles are considered to be neighbors. In the next step, the significance of each feature is computed and a greedy algorithm selects appropriate features. The performance of the proposed approach is compared to both the commonest and newest methods of feature selection. The experimental results show that EPSTEIN could select more efficient subsets of features and improve the prediction accuracy of classifiers in comparison to the other state-of-the-art methods such as Correlation-based Feature Selection (CFS), Fast Correlation-Based Filter (FCBF), Heuris- tic Algorithm Based on Neighborhood Discrimination Index (HANDI), Ranking Based Feature Inclusion for Optimal Feature Subset (KNFI), Ranking Based Feature Elimination (KNFE) and Principal Component Analysis and Information Gain (PCA-IG).
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Sentiment Analysis of Code-Mixed Text: A Comprehensive Review Mobile Handoff with 6LoWPAN Neighbour Discovery Auxiliary Communication A Proposal of Naturalistic Software Development Method Recommendation of Machine Learning Techniques for Software Effort Estimation using Multi-Criteria Decision Making Transfer Learning with EfficientNetV2S for Automatic Face Shape Classification
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1