A Multi-Split Cross-Strategy for Enhancing Machine Learning Algorithms Prediction Results with Data Generated by Conditional Generative Adversarial Network

Abdelfattah Abassi, Brahim Bakkas, Moustapha El Jai, Ahmed Arid, Hussain Benazza
{"title":"A Multi-Split Cross-Strategy for Enhancing Machine Learning Algorithms Prediction Results with Data Generated by Conditional Generative Adversarial Network","authors":"Abdelfattah Abassi, Brahim Bakkas, Moustapha El Jai, Ahmed Arid, Hussain Benazza","doi":"10.3844/jcssp.2024.700.707","DOIUrl":null,"url":null,"abstract":": In this study, we present a Multi-Split Cross-Strategy (MSC-Strategy) designed to leverage synthetic tabular data generated by a Conditional Generative Adversarial Network (CGAN). Our study aims to investigate the potential of synthetic data in comparison to real-world data for improving machine learning predictive results. Firstly, we develop a CGAN architecture tailored to generate synthetic tabular data, trained on a comprehensive real-world dataset. Secondly, we validate the synthetic data generated by the CGAN to ensure its statistical fidelity and resemblance to the distribution of real data. Finally, we selectively leverage a subset of the generated data and apply our strategy to create a new combined training set comprising the training set of real data and the chosen subset of generated data. To validate our approach, we employ six diverse regression models: Decision Tree (DT), K-Nearest Neighbors (KNN), Random Forest (RF), XGB Regressor (XGB), and Support Vector Regressor (SVR). Each model is trained and tested using a training set of real data, generated data, combined data (training set of real data and generated data), and data formed by our MSC strategy. Our findings indicate that the training set formed by our MSC strategy demonstrates remarkable predictive performance compared to real-world data and generated data, highlighting its ability to enhance the prediction of machine learning models using only a subset of generated data.","PeriodicalId":40005,"journal":{"name":"Journal of Computer Science","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3844/jcssp.2024.700.707","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

: In this study, we present a Multi-Split Cross-Strategy (MSC-Strategy) designed to leverage synthetic tabular data generated by a Conditional Generative Adversarial Network (CGAN). Our study aims to investigate the potential of synthetic data in comparison to real-world data for improving machine learning predictive results. Firstly, we develop a CGAN architecture tailored to generate synthetic tabular data, trained on a comprehensive real-world dataset. Secondly, we validate the synthetic data generated by the CGAN to ensure its statistical fidelity and resemblance to the distribution of real data. Finally, we selectively leverage a subset of the generated data and apply our strategy to create a new combined training set comprising the training set of real data and the chosen subset of generated data. To validate our approach, we employ six diverse regression models: Decision Tree (DT), K-Nearest Neighbors (KNN), Random Forest (RF), XGB Regressor (XGB), and Support Vector Regressor (SVR). Each model is trained and tested using a training set of real data, generated data, combined data (training set of real data and generated data), and data formed by our MSC strategy. Our findings indicate that the training set formed by our MSC strategy demonstrates remarkable predictive performance compared to real-world data and generated data, highlighting its ability to enhance the prediction of machine learning models using only a subset of generated data.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用条件生成对抗网络生成的数据增强机器学习算法预测结果的多分叉交叉策略
:在本研究中,我们提出了一种多分割交叉策略(MSC-Strategy),旨在利用条件生成对抗网络(CGAN)生成的合成表格数据。我们的研究旨在调查合成数据与真实世界数据相比在改善机器学习预测结果方面的潜力。首先,我们开发了一个专门用于生成合成表格数据的 CGAN 架构,并在一个全面的真实世界数据集上进行了训练。其次,我们对 CGAN 生成的合成数据进行验证,以确保其统计保真度和与真实数据分布的相似性。最后,我们选择性地利用生成数据的一个子集,并应用我们的策略创建一个新的组合训练集,其中包括真实数据训练集和所选的生成数据子集。为了验证我们的方法,我们采用了六种不同的回归模型:决策树 (DT)、K-近邻 (KNN)、随机森林 (RF)、XGB 回归模型 (XGB) 和支持向量回归模型 (SVR)。每个模型都使用由真实数据、生成数据、组合数据(由真实数据和生成数据组成的训练集)以及由我们的 MSC 策略形成的数据组成的训练集进行训练和测试。我们的研究结果表明,与真实世界数据和生成数据相比,我们的 MSC 策略所形成的训练集显示出卓越的预测性能,突出了它仅使用生成数据子集来增强机器学习模型预测能力的能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Computer Science
Journal of Computer Science Computer Science-Computer Networks and Communications
CiteScore
1.70
自引率
0.00%
发文量
92
期刊介绍: Journal of Computer Science is aimed to publish research articles on theoretical foundations of information and computation, and of practical techniques for their implementation and application in computer systems. JCS updated twelve times a year and is a peer reviewed journal covers the latest and most compelling research of the time.
期刊最新文献
Features of the Security System Development of a Computer Telecommunication Network Performance Assessment of CPU Scheduling Algorithms: A Scenario-Based Approach with FCFS, RR, and SJF Website-Based Educational Application to Help MSMEs in Indonesia Develop A Multi-Split Cross-Strategy for Enhancing Machine Learning Algorithms Prediction Results with Data Generated by Conditional Generative Adversarial Network Improving the Detection of Mask-Wearing Mistakes by Deep Learning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1