通过基于正逼近的粗糙子空间集合进行不完整数据分类

IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Big Data Research Pub Date : 2024-11-14 DOI:10.1016/j.bdr.2024.100496
Yuanting Yan , Meili Yang , Zhong Zheng , Hao Ge , Yiwen Zhang , Yanping Zhang
{"title":"通过基于正逼近的粗糙子空间集合进行不完整数据分类","authors":"Yuanting Yan ,&nbsp;Meili Yang ,&nbsp;Zhong Zheng ,&nbsp;Hao Ge ,&nbsp;Yiwen Zhang ,&nbsp;Yanping Zhang","doi":"10.1016/j.bdr.2024.100496","DOIUrl":null,"url":null,"abstract":"<div><div>Classifying incomplete data using ensemble techniques is a prevalent method for addressing missing values, where multiple classifiers are trained on diverse subsets of features. However, current ensemble-based methods overlook the redundancy within feature subsets, presenting challenges for training robust prediction models, because the redundant features can hinder the learning of the underlying rules in the data. In this paper, we propose a Reduct-Missing Pattern Fusion (RMPF) method to address the aforementioned limitation. It leverages both the advantages of rough set theory and the effectiveness of missing patterns in classifying incomplete data. RMPF employs a heuristic algorithm to generate a set of positive approximation-based attribute reducts. Subsequently, it integrates the missing patterns with these reducts through a fusion strategy to minimize data redundancy. Finally, the optimized subsets are utilized to train a group of base classifiers, and a selective prediction procedure is applied to produce the ensembled prediction results. Experimental results show that our method is superior to the compared state-of-the-art methods in both performance and robustness. Especially, our method obtains significant superiority in the scenarios of data with high missing rates.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"38 ","pages":"Article 100496"},"PeriodicalIF":3.5000,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Incomplete data classification via positive approximation based rough subspaces ensemble\",\"authors\":\"Yuanting Yan ,&nbsp;Meili Yang ,&nbsp;Zhong Zheng ,&nbsp;Hao Ge ,&nbsp;Yiwen Zhang ,&nbsp;Yanping Zhang\",\"doi\":\"10.1016/j.bdr.2024.100496\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Classifying incomplete data using ensemble techniques is a prevalent method for addressing missing values, where multiple classifiers are trained on diverse subsets of features. However, current ensemble-based methods overlook the redundancy within feature subsets, presenting challenges for training robust prediction models, because the redundant features can hinder the learning of the underlying rules in the data. In this paper, we propose a Reduct-Missing Pattern Fusion (RMPF) method to address the aforementioned limitation. It leverages both the advantages of rough set theory and the effectiveness of missing patterns in classifying incomplete data. RMPF employs a heuristic algorithm to generate a set of positive approximation-based attribute reducts. Subsequently, it integrates the missing patterns with these reducts through a fusion strategy to minimize data redundancy. Finally, the optimized subsets are utilized to train a group of base classifiers, and a selective prediction procedure is applied to produce the ensembled prediction results. Experimental results show that our method is superior to the compared state-of-the-art methods in both performance and robustness. Especially, our method obtains significant superiority in the scenarios of data with high missing rates.</div></div>\",\"PeriodicalId\":56017,\"journal\":{\"name\":\"Big Data Research\",\"volume\":\"38 \",\"pages\":\"Article 100496\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2024-11-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Big Data Research\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2214579624000716\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Big Data Research","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2214579624000716","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

使用集合技术对不完整数据进行分类是解决缺失值问题的一种普遍方法,在这种方法中,多个分类器都是根据不同的特征子集进行训练的。然而,目前基于集合的方法忽视了特征子集中的冗余性,给训练稳健的预测模型带来了挑战,因为冗余特征会阻碍数据中潜在规则的学习。在本文中,我们提出了一种减少缺失模式融合(Reduct-Missing Pattern Fusion,RMPF)方法来解决上述局限性。它充分利用了粗糙集理论的优势和缺失模式在不完整数据分类中的有效性。RMPF 采用启发式算法生成一组基于正近似的属性还原。随后,它通过融合策略将缺失模式与这些还原整合在一起,以尽量减少数据冗余。最后,利用优化后的子集来训练一组基础分类器,并采用选择性预测程序来生成集合预测结果。实验结果表明,我们的方法在性能和鲁棒性方面都优于同类最先进的方法。特别是在数据缺失率较高的情况下,我们的方法取得了显著的优势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Incomplete data classification via positive approximation based rough subspaces ensemble
Classifying incomplete data using ensemble techniques is a prevalent method for addressing missing values, where multiple classifiers are trained on diverse subsets of features. However, current ensemble-based methods overlook the redundancy within feature subsets, presenting challenges for training robust prediction models, because the redundant features can hinder the learning of the underlying rules in the data. In this paper, we propose a Reduct-Missing Pattern Fusion (RMPF) method to address the aforementioned limitation. It leverages both the advantages of rough set theory and the effectiveness of missing patterns in classifying incomplete data. RMPF employs a heuristic algorithm to generate a set of positive approximation-based attribute reducts. Subsequently, it integrates the missing patterns with these reducts through a fusion strategy to minimize data redundancy. Finally, the optimized subsets are utilized to train a group of base classifiers, and a selective prediction procedure is applied to produce the ensembled prediction results. Experimental results show that our method is superior to the compared state-of-the-art methods in both performance and robustness. Especially, our method obtains significant superiority in the scenarios of data with high missing rates.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Big Data Research
Big Data Research Computer Science-Computer Science Applications
CiteScore
8.40
自引率
3.00%
发文量
0
期刊介绍: The journal aims to promote and communicate advances in big data research by providing a fast and high quality forum for researchers, practitioners and policy makers from the very many different communities working on, and with, this topic. The journal will accept papers on foundational aspects in dealing with big data, as well as papers on specific Platforms and Technologies used to deal with big data. To promote Data Science and interdisciplinary collaboration between fields, and to showcase the benefits of data driven research, papers demonstrating applications of big data in domains as diverse as Geoscience, Social Web, Finance, e-Commerce, Health Care, Environment and Climate, Physics and Astronomy, Chemistry, life sciences and drug discovery, digital libraries and scientific publications, security and government will also be considered. Occasionally the journal may publish whitepapers on policies, standards and best practices.
期刊最新文献
Incomplete data classification via positive approximation based rough subspaces ensemble Joint embedding in hierarchical distance and semantic representation learning for link prediction Deep semantics-preserving cross-modal hashing Research on the characteristics of information propagation dynamic on the weighted multiplex Weibo networks Leveraging social computing for epidemic surveillance: A case study
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1