{"title":"基于一致矩阵的大规模数据集双选择方法","authors":"Jinsheng Quan;Fengcai Qiao;Tian Yang;Shuo Shen;Yuhua Qian","doi":"10.1109/TFUZZ.2025.3543893","DOIUrl":null,"url":null,"abstract":"Biselection (feature and sample selection) enhances the efficiency and accuracy of machine learning models when handling large-scale data. Fuzzy rough sets, an uncertainty mathematical model known for its excellent interpretability, are widely used in machine learning, particularly for feature selection. While the consistent matrix has significantly improved the computational efficiency and scalability of feature selection, like most fuzzy rough set-based methods, it focuses only on feature selection and seldom incorporates sample selection. This feature-centric approach can limit classification performance, particularly in noisy and large-scale datasets where both features and samples require judicious selection. To overcome these limitations, this article explores the integration of sample selection with feature selection. First, we introduce a <inline-formula><tex-math>$\\beta$</tex-math></inline-formula>-consistent granulation method to generate more accurate and concise fuzzy information granules. In addition, a novel membership function is employed to distinguish noise samples and irrelevant features simultaneously. As a result, a biselection algorithm with lower computational complexity is proposed to select high-quality features and samples. Numerical experiments demonstrate that, compared to eleven representative algorithms, our proposed method achieves an average accuracy improvement of 9.66% and a 933-fold increase in efficiency.","PeriodicalId":13212,"journal":{"name":"IEEE Transactions on Fuzzy Systems","volume":"33 6","pages":"1992-2005"},"PeriodicalIF":11.9000,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Biselection Method Based on Consistent Matrix for Large-Scale Datasets\",\"authors\":\"Jinsheng Quan;Fengcai Qiao;Tian Yang;Shuo Shen;Yuhua Qian\",\"doi\":\"10.1109/TFUZZ.2025.3543893\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Biselection (feature and sample selection) enhances the efficiency and accuracy of machine learning models when handling large-scale data. Fuzzy rough sets, an uncertainty mathematical model known for its excellent interpretability, are widely used in machine learning, particularly for feature selection. While the consistent matrix has significantly improved the computational efficiency and scalability of feature selection, like most fuzzy rough set-based methods, it focuses only on feature selection and seldom incorporates sample selection. This feature-centric approach can limit classification performance, particularly in noisy and large-scale datasets where both features and samples require judicious selection. To overcome these limitations, this article explores the integration of sample selection with feature selection. First, we introduce a <inline-formula><tex-math>$\\\\beta$</tex-math></inline-formula>-consistent granulation method to generate more accurate and concise fuzzy information granules. In addition, a novel membership function is employed to distinguish noise samples and irrelevant features simultaneously. As a result, a biselection algorithm with lower computational complexity is proposed to select high-quality features and samples. Numerical experiments demonstrate that, compared to eleven representative algorithms, our proposed method achieves an average accuracy improvement of 9.66% and a 933-fold increase in efficiency.\",\"PeriodicalId\":13212,\"journal\":{\"name\":\"IEEE Transactions on Fuzzy Systems\",\"volume\":\"33 6\",\"pages\":\"1992-2005\"},\"PeriodicalIF\":11.9000,\"publicationDate\":\"2025-03-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Fuzzy Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10910216/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Fuzzy Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10910216/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
A Biselection Method Based on Consistent Matrix for Large-Scale Datasets
Biselection (feature and sample selection) enhances the efficiency and accuracy of machine learning models when handling large-scale data. Fuzzy rough sets, an uncertainty mathematical model known for its excellent interpretability, are widely used in machine learning, particularly for feature selection. While the consistent matrix has significantly improved the computational efficiency and scalability of feature selection, like most fuzzy rough set-based methods, it focuses only on feature selection and seldom incorporates sample selection. This feature-centric approach can limit classification performance, particularly in noisy and large-scale datasets where both features and samples require judicious selection. To overcome these limitations, this article explores the integration of sample selection with feature selection. First, we introduce a $\beta$-consistent granulation method to generate more accurate and concise fuzzy information granules. In addition, a novel membership function is employed to distinguish noise samples and irrelevant features simultaneously. As a result, a biselection algorithm with lower computational complexity is proposed to select high-quality features and samples. Numerical experiments demonstrate that, compared to eleven representative algorithms, our proposed method achieves an average accuracy improvement of 9.66% and a 933-fold increase in efficiency.
期刊介绍:
The IEEE Transactions on Fuzzy Systems is a scholarly journal that focuses on the theory, design, and application of fuzzy systems. It aims to publish high-quality technical papers that contribute significant technical knowledge and exploratory developments in the field of fuzzy systems. The journal particularly emphasizes engineering systems and scientific applications. In addition to research articles, the Transactions also includes a letters section featuring current information, comments, and rebuttals related to published papers.