Unsupervised feature selection using sparse manifold learning: Auto-encoder approach

IF 6.9 1区管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Information Processing & Management Pub Date : 2025-01-01 Epub Date: 2024-10-18 DOI:10.1016/j.ipm.2024.103923

Amir Moslemi , Mina Jamshidi

{"title":"Unsupervised feature selection using sparse manifold learning: Auto-encoder approach","authors":"Amir Moslemi , Mina Jamshidi","doi":"10.1016/j.ipm.2024.103923","DOIUrl":null,"url":null,"abstract":"<div><div>Feature selection techniques are widely being used as a preprocessing step to train machine learning algorithms to circumvent the curse of dimensionality, overfitting, and computation time challenges. Projection-based methods are frequently employed in feature selection, leveraging the extraction of linear relationships among features. The absence of nonlinear information extraction among features is notable in this context. While auto-encoder based techniques have recently gained traction for feature selection, their focus remains primarily on the encoding phase, as it is through this phase that the selected features are derived. The subtle point is that the performance of auto-encoder to obtain the most discriminative features is significantly affected by decoding phase. To address these challenges, in this paper, we proposed a novel feature selection based on auto-encoder to not only extracting nonlinear information among features but also decoding phase is regularized as well to enhance the performance of algorithm. In this study, we defined a new model of auto-encoder to preserve the topological information of reconstructed close to input data. To geometric structure of input data is preserved in projected space using Laplacian graph, and geometrical projected space is preserved in reconstructed space using a suitable term (abstract Laplacian graph of reconstructed data) in optimization problem. Preserving abstract Laplacian graph of reconstructed data close to Laplacian graph of input data affects the performance of feature selection and we experimentally showed this. Therefore, we show an effective approach to solve the objective of the corresponding problem. Since this approach can be mainly used for clustering aims, we conducted experiments on ten benchmark datasets and assessed our propped method based on clustering accuracy and normalized mutual information (NMI) metric. Our method obtained considerable superiority over recent state-of-the-art techniques in terms of NMI and accuracy.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 1","pages":"Article 103923"},"PeriodicalIF":6.9000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457324002826","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/10/18 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Feature selection techniques are widely being used as a preprocessing step to train machine learning algorithms to circumvent the curse of dimensionality, overfitting, and computation time challenges. Projection-based methods are frequently employed in feature selection, leveraging the extraction of linear relationships among features. The absence of nonlinear information extraction among features is notable in this context. While auto-encoder based techniques have recently gained traction for feature selection, their focus remains primarily on the encoding phase, as it is through this phase that the selected features are derived. The subtle point is that the performance of auto-encoder to obtain the most discriminative features is significantly affected by decoding phase. To address these challenges, in this paper, we proposed a novel feature selection based on auto-encoder to not only extracting nonlinear information among features but also decoding phase is regularized as well to enhance the performance of algorithm. In this study, we defined a new model of auto-encoder to preserve the topological information of reconstructed close to input data. To geometric structure of input data is preserved in projected space using Laplacian graph, and geometrical projected space is preserved in reconstructed space using a suitable term (abstract Laplacian graph of reconstructed data) in optimization problem. Preserving abstract Laplacian graph of reconstructed data close to Laplacian graph of input data affects the performance of feature selection and we experimentally showed this. Therefore, we show an effective approach to solve the objective of the corresponding problem. Since this approach can be mainly used for clustering aims, we conducted experiments on ten benchmark datasets and assessed our propped method based on clustering accuracy and normalized mutual information (NMI) metric. Our method obtained considerable superiority over recent state-of-the-art techniques in terms of NMI and accuracy.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用稀疏流形学习的无监督特征选择：自动编码器方法

特征选择技术被广泛用作训练机器学习算法的预处理步骤，以规避维度诅咒、过拟合和计算时间等难题。特征选择中经常使用基于投影的方法，利用提取特征之间的线性关系。在这种情况下，特征间非线性信息提取的缺失是值得注意的。虽然基于自动编码器的技术最近在特征选择中得到了广泛应用，但其重点仍主要集中在编码阶段，因为所选特征正是通过这一阶段得到的。一个微妙的问题是，自动编码器获取最具区分度特征的性能受到解码阶段的显著影响。为了应对这些挑战，本文提出了一种基于自动编码器的新型特征选择方法，不仅能提取特征间的非线性信息，还能对解码阶段进行正则化处理，从而提高算法的性能。在这项研究中，我们定义了一种新的自动编码器模型，以保留重建后接近输入数据的拓扑信息。利用拉普拉奇图在投影空间中保留输入数据的几何结构，并利用优化问题中的适当术语（重建数据的抽象拉普拉奇图）在重建空间中保留几何投影空间。保持重建数据的抽象拉普拉奇图接近输入数据的拉普拉奇图会影响特征选择的性能，我们的实验证明了这一点。因此，我们展示了一种解决相应问题目标的有效方法。由于这种方法主要用于聚类目的，我们在十个基准数据集上进行了实验，并根据聚类精度和归一化互信息（NMI）度量评估了我们的支持方法。在归一化互信息（NMI）和准确性方面，我们的方法比最近的先进技术有很大优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Information Processing & Management 工程技术-计算机：信息系统

CiteScore

17.00

自引率

11.60%

发文量

276

审稿时长

39 days

期刊介绍： Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing. We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.