Graph reduction techniques for instance selection: comparative and empirical study

IF 10.7 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Artificial Intelligence Review Pub Date : 2024-12-20 DOI:10.1007/s10462-024-10971-4
Zahiriddin Rustamov, Nazar Zaki, Jaloliddin Rustamov, Ayham Zaitouny, Rafat Damseh
{"title":"Graph reduction techniques for instance selection: comparative and empirical study","authors":"Zahiriddin Rustamov,&nbsp;Nazar Zaki,&nbsp;Jaloliddin Rustamov,&nbsp;Ayham Zaitouny,&nbsp;Rafat Damseh","doi":"10.1007/s10462-024-10971-4","DOIUrl":null,"url":null,"abstract":"<div><p>The surge in data generation has prompted a shift to big data, challenging the notion that “more data equals better performance” due to processing and time constraints. In this evolving artificial intelligence and machine learning landscape, instance selection (IS) has become crucial for data reduction without compromising model quality. Traditional IS methods, though efficient, often struggle with large, complex datasets in data mining. This study evaluates graph reduction techniques, grounded in graph theory, as a novel approach for instance selection. The objective is to leverage the inherent structures of data represented as graphs to enhance the effectiveness of instance selection. We evaluated 35 graph reduction techniques across 29 classification datasets. These techniques were assessed based on various metrics, including accuracy, F1 score, reduction rate, and computational times. Graph reduction methods showed significant potential in maintaining data integrity while achieving substantial reductions. Top techniques achieved up to 99% reduction while maintaining or improving accuracy. For instance, the Multilevel sampling achieved an accuracy effectiveness score of 0.8555 with 99.16% reduction on large datasets, while Leiden sampling showed high effectiveness on smaller datasets (0.8034 accuracy, 97.87% reduction). Computational efficiency varied widely, with reduction times ranging from milliseconds to minutes. This research advances the theory of graph-based instance selection and offers practical application guidelines. Our findings indicate graph reduction methods effectively preserve data quality and boost processing efficiency in large, complex datasets, with some techniques achieving up to 160-fold speedups in model training at high reduction rates.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"58 2","pages":""},"PeriodicalIF":10.7000,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-024-10971-4.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence Review","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10462-024-10971-4","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

The surge in data generation has prompted a shift to big data, challenging the notion that “more data equals better performance” due to processing and time constraints. In this evolving artificial intelligence and machine learning landscape, instance selection (IS) has become crucial for data reduction without compromising model quality. Traditional IS methods, though efficient, often struggle with large, complex datasets in data mining. This study evaluates graph reduction techniques, grounded in graph theory, as a novel approach for instance selection. The objective is to leverage the inherent structures of data represented as graphs to enhance the effectiveness of instance selection. We evaluated 35 graph reduction techniques across 29 classification datasets. These techniques were assessed based on various metrics, including accuracy, F1 score, reduction rate, and computational times. Graph reduction methods showed significant potential in maintaining data integrity while achieving substantial reductions. Top techniques achieved up to 99% reduction while maintaining or improving accuracy. For instance, the Multilevel sampling achieved an accuracy effectiveness score of 0.8555 with 99.16% reduction on large datasets, while Leiden sampling showed high effectiveness on smaller datasets (0.8034 accuracy, 97.87% reduction). Computational efficiency varied widely, with reduction times ranging from milliseconds to minutes. This research advances the theory of graph-based instance selection and offers practical application guidelines. Our findings indicate graph reduction methods effectively preserve data quality and boost processing efficiency in large, complex datasets, with some techniques achieving up to 160-fold speedups in model training at high reduction rates.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
实例选择的图缩减技术:比较与实证研究
数据生成量的激增促使人们转向大数据,但由于处理和时间限制,"数据越多性能越好 "的观念受到了挑战。在不断发展的人工智能和机器学习领域,实例选择(IS)已成为在不影响模型质量的前提下减少数据的关键。传统的实例选择(IS)方法虽然高效,但在数据挖掘的大型复杂数据集中往往显得力不从心。本研究评估了基于图论的图缩减技术,将其作为实例选择的一种新方法。其目的是利用以图表示的数据固有结构来提高实例选择的有效性。我们在 29 个分类数据集上评估了 35 种图缩减技术。我们根据各种指标对这些技术进行了评估,包括准确率、F1 分数、缩减率和计算时间。图缩减方法在保持数据完整性的同时实现大幅缩减方面表现出了巨大的潜力。顶尖技术在保持或提高准确率的同时,实现了高达 99% 的缩减率。例如,在大型数据集上,多层次采样的准确率达到了 0.8555,缩减率为 99.16%,而莱顿采样则在小型数据集上表现出很高的有效性(准确率为 0.8034,缩减率为 97.87%)。计算效率差异很大,缩减时间从几毫秒到几分钟不等。这项研究推进了基于图的实例选择理论,并提供了实际应用指南。我们的研究结果表明,图缩减方法能有效保持数据质量,并提高大型复杂数据集的处理效率,其中一些技术在高缩减率的模型训练中实现了高达 160 倍的提速。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Artificial Intelligence Review
Artificial Intelligence Review 工程技术-计算机:人工智能
CiteScore
22.00
自引率
3.30%
发文量
194
审稿时长
5.3 months
期刊介绍: Artificial Intelligence Review, a fully open access journal, publishes cutting-edge research in artificial intelligence and cognitive science. It features critical evaluations of applications, techniques, and algorithms, providing a platform for both researchers and application developers. The journal includes refereed survey and tutorial articles, along with reviews and commentary on significant developments in the field.
期刊最新文献
Artificial intelligence for geometry-based feature extraction, analysis and synthesis in artistic images: a survey How the internet of things technology improves agricultural efficiency Models of symbol emergence in communication: a conceptual review and a guide for avoiding local minima Digital phenotypes and digital biomarkers for health and diseases: a systematic review of machine learning approaches utilizing passive non-invasive signals collected via wearable devices and smartphones A survey on deep learning-based automated essay scoring and feedback generation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1