{"title":"iLDA: A new dimensional reduction method for non-Gaussian and small sample size datasets","authors":"Usman Sudibyo , Supriadi Rustad , Pulung Nurtantio Andono , Ahmad Zainul Fanani , Catur Supriyanto","doi":"10.1016/j.eij.2024.100533","DOIUrl":null,"url":null,"abstract":"<div><p>High-dimensional non-Gaussian data is widely found in the real world, such as in face recognition, facial expressions, document recognition, and text processing. Linear discriminant analysis (LDA) as dimensionality reduction performs poorly on non-Gaussian data and fails on high-dimensional data when the number of features is greater than the number of instances, commonly referred to as a small sample size (SSS) problem. We proposed a new method to reduce the number of dimensions called iterative LDA (iLDA). This method will handle the iterative use of LDA by gradually extracting features until the best separability is reached. The proposed method produces better vector projections than LDA for Gaussian and non-Gaussian data and avoids the singularity problem in high-dimensional data. Running LDA does not necessarily increase the excessive computational cost caused by calculating eigenvectors since the eigenvectors are calculated from small-dimensional matrices. The experimental results show performance improvement on 8 out of 10 small-dimensional datasets, and the best improvement occurs on the ULC dataset, from 0.753 to 0.861. For image datasets, accuracy improved in all datasets, with the Chest CT-Scan dataset showing the greatest improvement, followed by Georgia Tech from 0.6044 to 0.8384 and 0.8883 to 0.9481, respectively.</p></div>","PeriodicalId":56010,"journal":{"name":"Egyptian Informatics Journal","volume":null,"pages":null},"PeriodicalIF":5.0000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1110866524000963/pdfft?md5=bda3e96355d0ec1968200790f98f4afd&pid=1-s2.0-S1110866524000963-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Egyptian Informatics Journal","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1110866524000963","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
High-dimensional non-Gaussian data is widely found in the real world, such as in face recognition, facial expressions, document recognition, and text processing. Linear discriminant analysis (LDA) as dimensionality reduction performs poorly on non-Gaussian data and fails on high-dimensional data when the number of features is greater than the number of instances, commonly referred to as a small sample size (SSS) problem. We proposed a new method to reduce the number of dimensions called iterative LDA (iLDA). This method will handle the iterative use of LDA by gradually extracting features until the best separability is reached. The proposed method produces better vector projections than LDA for Gaussian and non-Gaussian data and avoids the singularity problem in high-dimensional data. Running LDA does not necessarily increase the excessive computational cost caused by calculating eigenvectors since the eigenvectors are calculated from small-dimensional matrices. The experimental results show performance improvement on 8 out of 10 small-dimensional datasets, and the best improvement occurs on the ULC dataset, from 0.753 to 0.861. For image datasets, accuracy improved in all datasets, with the Chest CT-Scan dataset showing the greatest improvement, followed by Georgia Tech from 0.6044 to 0.8384 and 0.8883 to 0.9481, respectively.
期刊介绍:
The Egyptian Informatics Journal is published by the Faculty of Computers and Artificial Intelligence, Cairo University. This Journal provides a forum for the state-of-the-art research and development in the fields of computing, including computer sciences, information technologies, information systems, operations research and decision support. Innovative and not-previously-published work in subjects covered by the Journal is encouraged to be submitted, whether from academic, research or commercial sources.