{"title":"Unsupervised feature selection using sparse manifold learning: Auto-encoder approach","authors":"Amir Moslemi , Mina Jamshidi","doi":"10.1016/j.ipm.2024.103923","DOIUrl":null,"url":null,"abstract":"<div><div>Feature selection techniques are widely being used as a preprocessing step to train machine learning algorithms to circumvent the curse of dimensionality, overfitting, and computation time challenges. Projection-based methods are frequently employed in feature selection, leveraging the extraction of linear relationships among features. The absence of nonlinear information extraction among features is notable in this context. While auto-encoder based techniques have recently gained traction for feature selection, their focus remains primarily on the encoding phase, as it is through this phase that the selected features are derived. The subtle point is that the performance of auto-encoder to obtain the most discriminative features is significantly affected by decoding phase. To address these challenges, in this paper, we proposed a novel feature selection based on auto-encoder to not only extracting nonlinear information among features but also decoding phase is regularized as well to enhance the performance of algorithm. In this study, we defined a new model of auto-encoder to preserve the topological information of reconstructed close to input data. To geometric structure of input data is preserved in projected space using Laplacian graph, and geometrical projected space is preserved in reconstructed space using a suitable term (abstract Laplacian graph of reconstructed data) in optimization problem. Preserving abstract Laplacian graph of reconstructed data close to Laplacian graph of input data affects the performance of feature selection and we experimentally showed this. Therefore, we show an effective approach to solve the objective of the corresponding problem. Since this approach can be mainly used for clustering aims, we conducted experiments on ten benchmark datasets and assessed our propped method based on clustering accuracy and normalized mutual information (NMI) metric. Our method obtained considerable superiority over recent state-of-the-art techniques in terms of NMI and accuracy.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 1","pages":"Article 103923"},"PeriodicalIF":7.4000,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457324002826","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Feature selection techniques are widely being used as a preprocessing step to train machine learning algorithms to circumvent the curse of dimensionality, overfitting, and computation time challenges. Projection-based methods are frequently employed in feature selection, leveraging the extraction of linear relationships among features. The absence of nonlinear information extraction among features is notable in this context. While auto-encoder based techniques have recently gained traction for feature selection, their focus remains primarily on the encoding phase, as it is through this phase that the selected features are derived. The subtle point is that the performance of auto-encoder to obtain the most discriminative features is significantly affected by decoding phase. To address these challenges, in this paper, we proposed a novel feature selection based on auto-encoder to not only extracting nonlinear information among features but also decoding phase is regularized as well to enhance the performance of algorithm. In this study, we defined a new model of auto-encoder to preserve the topological information of reconstructed close to input data. To geometric structure of input data is preserved in projected space using Laplacian graph, and geometrical projected space is preserved in reconstructed space using a suitable term (abstract Laplacian graph of reconstructed data) in optimization problem. Preserving abstract Laplacian graph of reconstructed data close to Laplacian graph of input data affects the performance of feature selection and we experimentally showed this. Therefore, we show an effective approach to solve the objective of the corresponding problem. Since this approach can be mainly used for clustering aims, we conducted experiments on ten benchmark datasets and assessed our propped method based on clustering accuracy and normalized mutual information (NMI) metric. Our method obtained considerable superiority over recent state-of-the-art techniques in terms of NMI and accuracy.
期刊介绍:
Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing.
We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.