Incremental reduction of imbalanced distributed mixed data based on k-nearest neighbor rough set

IF 3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE International Journal of Approximate Reasoning Pub Date : 2024-09-01 Epub Date: 2024-06-04 DOI:10.1016/j.ijar.2024.109218

Weihua Xu, Changchun Liu

{"title":"Incremental reduction of imbalanced distributed mixed data based on k-nearest neighbor rough set","authors":"Weihua Xu, Changchun Liu","doi":"10.1016/j.ijar.2024.109218","DOIUrl":null,"url":null,"abstract":"<div><p>Incremental feature selection methods have garnered significant research attention in improving the efficiency of feature selection for dynamic datasets. However, there is currently a dearth of research on incremental feature selection methods specifically targeted for unbalanced mixed-type data. Furthermore, the widely used neighborhood rough set algorithm exhibits low classification efficiency for imbalanced data distribution and performs poorly in classifying mixed samples. Motivated by these two challenges, we investigate the use of an incremental feature reduction algorithm based on <em>k-</em>nearest neighbors and mutual information in this study. Firstly, we enhance the capabilities of the neighborhood rough set model by incorporating the concept of <em>k-</em>nearest neighbors, thereby improving its ability to handle samples with varying densities. Subsequently, we apply information entropy theory and combine neighborhood mutual information with the maximum relevance minimum redundancy criterion to construct a novel feature importance evaluation function. This function is utilized as the evaluation metric for feature selection. Finally, an incremental feature selection algorithm is designed based on the above static algorithm. Experiments were conducted on twelve public datasets to evaluate the robustness of the proposed feature metrics and the performance of the incremental feature selection algorithm. The experimental results validated the robustness of the proposed metrics and demonstrated that our incremental algorithm is effective and efficient in feature reduction for updating unbalanced mixed data.</p></div>","PeriodicalId":13842,"journal":{"name":"International Journal of Approximate Reasoning","volume":"172 ","pages":"Article 109218"},"PeriodicalIF":3.0000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Approximate Reasoning","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0888613X24001051","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/6/4 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Incremental feature selection methods have garnered significant research attention in improving the efficiency of feature selection for dynamic datasets. However, there is currently a dearth of research on incremental feature selection methods specifically targeted for unbalanced mixed-type data. Furthermore, the widely used neighborhood rough set algorithm exhibits low classification efficiency for imbalanced data distribution and performs poorly in classifying mixed samples. Motivated by these two challenges, we investigate the use of an incremental feature reduction algorithm based on k-nearest neighbors and mutual information in this study. Firstly, we enhance the capabilities of the neighborhood rough set model by incorporating the concept of k-nearest neighbors, thereby improving its ability to handle samples with varying densities. Subsequently, we apply information entropy theory and combine neighborhood mutual information with the maximum relevance minimum redundancy criterion to construct a novel feature importance evaluation function. This function is utilized as the evaluation metric for feature selection. Finally, an incremental feature selection algorithm is designed based on the above static algorithm. Experiments were conducted on twelve public datasets to evaluate the robustness of the proposed feature metrics and the performance of the incremental feature selection algorithm. The experimental results validated the robustness of the proposed metrics and demonstrated that our incremental algorithm is effective and efficient in feature reduction for updating unbalanced mixed data.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于 k-nearest neighbor 粗糙集的不平衡分布式混合数据增量缩减法

增量特征选择方法在提高动态数据集的特征选择效率方面获得了大量研究关注。然而，目前专门针对不平衡混合型数据的增量特征选择方法的研究还很缺乏。此外，广泛使用的邻域粗糙集算法对不平衡数据分布的分类效率较低，在对混合样本进行分类时表现不佳。受这两个挑战的启发，我们在本研究中探讨了一种基于 k 近邻和互信息的增量特征缩减算法。首先，我们通过纳入 k 近邻的概念来增强邻域粗糙集模型的能力，从而提高其处理不同密度样本的能力。随后，我们应用信息熵理论，将邻域互信息与最大相关性最小冗余准则相结合，构建了一个新颖的特征重要性评估函数。该函数被用作特征选择的评价指标。最后，基于上述静态算法设计了一种增量特征选择算法。我们在 12 个公共数据集上进行了实验，以评估所提出的特征指标的鲁棒性和增量特征选择算法的性能。实验结果验证了所提指标的鲁棒性，并证明我们的增量算法在更新不平衡混合数据时能有效减少特征。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

International Journal of Approximate Reasoning 工程技术-计算机：人工智能

CiteScore

6.90

自引率

12.80%

发文量

170

审稿时长

67 days

期刊介绍： The International Journal of Approximate Reasoning is intended to serve as a forum for the treatment of imprecision and uncertainty in Artificial and Computational Intelligence, covering both the foundations of uncertainty theories, and the design of intelligent systems for scientific and engineering applications. It publishes high-quality research papers describing theoretical developments or innovative applications, as well as review articles on topics of general interest. Relevant topics include, but are not limited to, probabilistic reasoning and Bayesian networks, imprecise probabilities, random sets, belief functions (Dempster-Shafer theory), possibility theory, fuzzy sets, rough sets, decision theory, non-additive measures and integrals, qualitative reasoning about uncertainty, comparative probability orderings, game-theoretic probability, default reasoning, nonstandard logics, argumentation systems, inconsistency tolerant reasoning, elicitation techniques, philosophical foundations and psychological models of uncertain reasoning. Domains of application for uncertain reasoning systems include risk analysis and assessment, information retrieval and database design, information fusion, machine learning, data and web mining, computer vision, image and signal processing, intelligent data analysis, statistics, multi-agent systems, etc.

期刊最新文献

Submodular neighborhood covering reduction for tri-partition classification MACO-SMOTE: A multi-adaptive center oversampling technique for imbalanced data Feature selection with a lexicographic social ranking method HALO: Hardness-aware bilevel-inspired contrastive graph clustering Measuring the distance between single random inputs and OWA operators