Incremental reduction of imbalanced distributed mixed data based on k-nearest neighbor rough set

IF 3.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE International Journal of Approximate Reasoning Pub Date : 2024-06-04 DOI:10.1016/j.ijar.2024.109218
Weihua Xu, Changchun Liu
{"title":"Incremental reduction of imbalanced distributed mixed data based on k-nearest neighbor rough set","authors":"Weihua Xu,&nbsp;Changchun Liu","doi":"10.1016/j.ijar.2024.109218","DOIUrl":null,"url":null,"abstract":"<div><p>Incremental feature selection methods have garnered significant research attention in improving the efficiency of feature selection for dynamic datasets. However, there is currently a dearth of research on incremental feature selection methods specifically targeted for unbalanced mixed-type data. Furthermore, the widely used neighborhood rough set algorithm exhibits low classification efficiency for imbalanced data distribution and performs poorly in classifying mixed samples. Motivated by these two challenges, we investigate the use of an incremental feature reduction algorithm based on <em>k-</em>nearest neighbors and mutual information in this study. Firstly, we enhance the capabilities of the neighborhood rough set model by incorporating the concept of <em>k-</em>nearest neighbors, thereby improving its ability to handle samples with varying densities. Subsequently, we apply information entropy theory and combine neighborhood mutual information with the maximum relevance minimum redundancy criterion to construct a novel feature importance evaluation function. This function is utilized as the evaluation metric for feature selection. Finally, an incremental feature selection algorithm is designed based on the above static algorithm. Experiments were conducted on twelve public datasets to evaluate the robustness of the proposed feature metrics and the performance of the incremental feature selection algorithm. The experimental results validated the robustness of the proposed metrics and demonstrated that our incremental algorithm is effective and efficient in feature reduction for updating unbalanced mixed data.</p></div>","PeriodicalId":13842,"journal":{"name":"International Journal of Approximate Reasoning","volume":"172 ","pages":"Article 109218"},"PeriodicalIF":3.2000,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Approximate Reasoning","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0888613X24001051","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Incremental feature selection methods have garnered significant research attention in improving the efficiency of feature selection for dynamic datasets. However, there is currently a dearth of research on incremental feature selection methods specifically targeted for unbalanced mixed-type data. Furthermore, the widely used neighborhood rough set algorithm exhibits low classification efficiency for imbalanced data distribution and performs poorly in classifying mixed samples. Motivated by these two challenges, we investigate the use of an incremental feature reduction algorithm based on k-nearest neighbors and mutual information in this study. Firstly, we enhance the capabilities of the neighborhood rough set model by incorporating the concept of k-nearest neighbors, thereby improving its ability to handle samples with varying densities. Subsequently, we apply information entropy theory and combine neighborhood mutual information with the maximum relevance minimum redundancy criterion to construct a novel feature importance evaluation function. This function is utilized as the evaluation metric for feature selection. Finally, an incremental feature selection algorithm is designed based on the above static algorithm. Experiments were conducted on twelve public datasets to evaluate the robustness of the proposed feature metrics and the performance of the incremental feature selection algorithm. The experimental results validated the robustness of the proposed metrics and demonstrated that our incremental algorithm is effective and efficient in feature reduction for updating unbalanced mixed data.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于 k-nearest neighbor 粗糙集的不平衡分布式混合数据增量缩减法
增量特征选择方法在提高动态数据集的特征选择效率方面获得了大量研究关注。然而,目前专门针对不平衡混合型数据的增量特征选择方法的研究还很缺乏。此外,广泛使用的邻域粗糙集算法对不平衡数据分布的分类效率较低,在对混合样本进行分类时表现不佳。受这两个挑战的启发,我们在本研究中探讨了一种基于 k 近邻和互信息的增量特征缩减算法。首先,我们通过纳入 k 近邻的概念来增强邻域粗糙集模型的能力,从而提高其处理不同密度样本的能力。随后,我们应用信息熵理论,将邻域互信息与最大相关性最小冗余准则相结合,构建了一个新颖的特征重要性评估函数。该函数被用作特征选择的评价指标。最后,基于上述静态算法设计了一种增量特征选择算法。我们在 12 个公共数据集上进行了实验,以评估所提出的特征指标的鲁棒性和增量特征选择算法的性能。实验结果验证了所提指标的鲁棒性,并证明我们的增量算法在更新不平衡混合数据时能有效减少特征。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
International Journal of Approximate Reasoning
International Journal of Approximate Reasoning 工程技术-计算机:人工智能
CiteScore
6.90
自引率
12.80%
发文量
170
审稿时长
67 days
期刊介绍: The International Journal of Approximate Reasoning is intended to serve as a forum for the treatment of imprecision and uncertainty in Artificial and Computational Intelligence, covering both the foundations of uncertainty theories, and the design of intelligent systems for scientific and engineering applications. It publishes high-quality research papers describing theoretical developments or innovative applications, as well as review articles on topics of general interest. Relevant topics include, but are not limited to, probabilistic reasoning and Bayesian networks, imprecise probabilities, random sets, belief functions (Dempster-Shafer theory), possibility theory, fuzzy sets, rough sets, decision theory, non-additive measures and integrals, qualitative reasoning about uncertainty, comparative probability orderings, game-theoretic probability, default reasoning, nonstandard logics, argumentation systems, inconsistency tolerant reasoning, elicitation techniques, philosophical foundations and psychological models of uncertain reasoning. Domains of application for uncertain reasoning systems include risk analysis and assessment, information retrieval and database design, information fusion, machine learning, data and web mining, computer vision, image and signal processing, intelligent data analysis, statistics, multi-agent systems, etc.
期刊最新文献
Cautious classifier ensembles for set-valued decision-making Robust Bayesian causal estimation for causal inference in medical diagnosis Existence of optimal strategies in bimatrix game and applications An approach to calculate conceptual distance across multi-granularity based on three-way partial order structure Incremental attribute reduction with α,β-level intuitionistic fuzzy sets
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1