{"title":"Exploring structural components in autoencoder-based data clustering","authors":"Sujoy Chatterjee , Suvra Jyoti Choudhury","doi":"10.1016/j.engappai.2024.109562","DOIUrl":null,"url":null,"abstract":"<div><div>Clustering is a fundamental machine-learning task that has received extensive popularity in the literature. The foundational tenet of traditional clustering approaches is that data are learned to be vectorized features through various representational learning techniques. The conventional clustering methods can no longer manage the high-dimensional data as the data gets more intricate. Numerous representational learning strategies using deep architectures have been presented over the years, particularly deep unsupervised learning due to its superiority over conventional approaches. In most existing research, especially in the autoencoder-based approaches, only the distance information of pair-of-points in the original data space is retained in the latent space. However, combining this with additional preserving factors like the variance and independent component in the original data and latent space, respectively, is important. In addition, the model’s stability under noisy conditions is crucial. This paper provides a unique method for clustering data that combines autoencoder (AE), principal component analysis (PCA), and independent component analysis (ICA) to capture a relevant latent space representation. A further aid in lowering the dimensionality to improve clustering effectiveness is employing two-dimensional reduction algorithms, i.e., PCA and <span><math><mrow><mi>t</mi><mo>−</mo></mrow></math></span>distributed stochastic neighbor embedding (<span><math><mrow><mi>t</mi><mo>−</mo></mrow></math></span>SNE). The proposed technique produces more precise and reliable clustering by utilizing the advantages of both approaches. To compare the efficiency of the proposed methods to conventional clustering methods and stand-alone autoencoders, we conduct comprehensive experiments on 13 real-life datasets. The outcomes demonstrate the approach’s intriguing potential for addressing complicated clustering problems, and importantly, effectiveness is demonstrated even under noisy conditions.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"140 ","pages":"Article 109562"},"PeriodicalIF":7.5000,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0952197624017202","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Clustering is a fundamental machine-learning task that has received extensive popularity in the literature. The foundational tenet of traditional clustering approaches is that data are learned to be vectorized features through various representational learning techniques. The conventional clustering methods can no longer manage the high-dimensional data as the data gets more intricate. Numerous representational learning strategies using deep architectures have been presented over the years, particularly deep unsupervised learning due to its superiority over conventional approaches. In most existing research, especially in the autoencoder-based approaches, only the distance information of pair-of-points in the original data space is retained in the latent space. However, combining this with additional preserving factors like the variance and independent component in the original data and latent space, respectively, is important. In addition, the model’s stability under noisy conditions is crucial. This paper provides a unique method for clustering data that combines autoencoder (AE), principal component analysis (PCA), and independent component analysis (ICA) to capture a relevant latent space representation. A further aid in lowering the dimensionality to improve clustering effectiveness is employing two-dimensional reduction algorithms, i.e., PCA and distributed stochastic neighbor embedding (SNE). The proposed technique produces more precise and reliable clustering by utilizing the advantages of both approaches. To compare the efficiency of the proposed methods to conventional clustering methods and stand-alone autoencoders, we conduct comprehensive experiments on 13 real-life datasets. The outcomes demonstrate the approach’s intriguing potential for addressing complicated clustering problems, and importantly, effectiveness is demonstrated even under noisy conditions.
期刊介绍:
Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.