通过混合数据增强增强小表格临床试验数据集:SMOTE和wggan - gp的结合

IF 2.7 3区 物理与天体物理 Q2 PHYSICS, ATOMIC, MOLECULAR & CHEMICAL Atomic Data and Nuclear Data Tables Pub Date : 2023-08-23 DOI:10.3390/data8090135
Winston Wang, Tun-Wen Pai
{"title":"通过混合数据增强增强小表格临床试验数据集:SMOTE和wggan - gp的结合","authors":"Winston Wang, Tun-Wen Pai","doi":"10.3390/data8090135","DOIUrl":null,"url":null,"abstract":"This study addressed the challenge of training generative adversarial networks (GANs) on small tabular clinical trial datasets for data augmentation, which are known to pose difficulties in training due to limited sample sizes. To overcome this obstacle, a hybrid approach is proposed, combining the synthetic minority oversampling technique (SMOTE) to initially augment the original data to a more substantial size for improving the subsequent GAN training with a Wasserstein conditional generative adversarial network with gradient penalty (WCGAN-GP), proven for its state-of-art performance and enhanced stability. The ultimate objective of this research was to demonstrate that the quality of synthetic tabular data generated by the final WCGAN-GP model maintains the structural integrity and statistical representation of the original small dataset using this hybrid approach. This focus is particularly relevant for clinical trials, where limited data availability due to privacy concerns and restricted accessibility to subject enrollment pose common challenges. Despite the limitation of data, the findings demonstrate that the hybrid approach successfully generates synthetic data that closely preserved the characteristics of the original small dataset. By harnessing the power of this hybrid approach to generate faithful synthetic data, the potential for enhancing data-driven research in drug clinical trials become evident. This includes enabling a robust analysis on small datasets, supplementing the lack of clinical trial data, facilitating its utility in machine learning tasks, even extending to using the model for anomaly detection to ensure better quality control during clinical trial data collection, all while prioritizing data privacy and implementing strict data protection measures.","PeriodicalId":55580,"journal":{"name":"Atomic Data and Nuclear Data Tables","volume":"5 1","pages":""},"PeriodicalIF":2.7000,"publicationDate":"2023-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing Small Tabular Clinical Trial Dataset through Hybrid Data Augmentation: Combining SMOTE and WCGAN-GP\",\"authors\":\"Winston Wang, Tun-Wen Pai\",\"doi\":\"10.3390/data8090135\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This study addressed the challenge of training generative adversarial networks (GANs) on small tabular clinical trial datasets for data augmentation, which are known to pose difficulties in training due to limited sample sizes. To overcome this obstacle, a hybrid approach is proposed, combining the synthetic minority oversampling technique (SMOTE) to initially augment the original data to a more substantial size for improving the subsequent GAN training with a Wasserstein conditional generative adversarial network with gradient penalty (WCGAN-GP), proven for its state-of-art performance and enhanced stability. The ultimate objective of this research was to demonstrate that the quality of synthetic tabular data generated by the final WCGAN-GP model maintains the structural integrity and statistical representation of the original small dataset using this hybrid approach. This focus is particularly relevant for clinical trials, where limited data availability due to privacy concerns and restricted accessibility to subject enrollment pose common challenges. Despite the limitation of data, the findings demonstrate that the hybrid approach successfully generates synthetic data that closely preserved the characteristics of the original small dataset. By harnessing the power of this hybrid approach to generate faithful synthetic data, the potential for enhancing data-driven research in drug clinical trials become evident. This includes enabling a robust analysis on small datasets, supplementing the lack of clinical trial data, facilitating its utility in machine learning tasks, even extending to using the model for anomaly detection to ensure better quality control during clinical trial data collection, all while prioritizing data privacy and implementing strict data protection measures.\",\"PeriodicalId\":55580,\"journal\":{\"name\":\"Atomic Data and Nuclear Data Tables\",\"volume\":\"5 1\",\"pages\":\"\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2023-08-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Atomic Data and Nuclear Data Tables\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://doi.org/10.3390/data8090135\",\"RegionNum\":3,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"PHYSICS, ATOMIC, MOLECULAR & CHEMICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Atomic Data and Nuclear Data Tables","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.3390/data8090135","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PHYSICS, ATOMIC, MOLECULAR & CHEMICAL","Score":null,"Total":0}
引用次数: 0

摘要

本研究解决了在小型表格临床试验数据集上训练生成对抗网络(gan)以进行数据增强的挑战,由于样本量有限,这些数据集在训练中存在困难。为了克服这一障碍,提出了一种混合方法,结合合成少数过采样技术(SMOTE),最初将原始数据扩展到更大的规模,以改进随后使用带梯度惩罚的Wasserstein条件生成对抗网络(wggan - gp)进行的GAN训练,该方法被证明具有最先进的性能和增强的稳定性。本研究的最终目的是证明最终wggan - gp模型生成的合成表格数据的质量使用这种混合方法保持了原始小数据集的结构完整性和统计代表性。这一重点与临床试验尤其相关,在临床试验中,由于隐私问题和受试者登记的可及性受限,数据可用性有限,这构成了共同的挑战。尽管数据有限,但研究结果表明,混合方法成功地生成了保留原始小数据集特征的合成数据。通过利用这种混合方法的力量来生成忠实的合成数据,在药物临床试验中加强数据驱动研究的潜力变得明显。这包括对小型数据集进行强大的分析,补充临床试验数据的不足,促进其在机器学习任务中的应用,甚至扩展到使用该模型进行异常检测,以确保在临床试验数据收集过程中更好的质量控制,同时优先考虑数据隐私并实施严格的数据保护措施。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Enhancing Small Tabular Clinical Trial Dataset through Hybrid Data Augmentation: Combining SMOTE and WCGAN-GP
This study addressed the challenge of training generative adversarial networks (GANs) on small tabular clinical trial datasets for data augmentation, which are known to pose difficulties in training due to limited sample sizes. To overcome this obstacle, a hybrid approach is proposed, combining the synthetic minority oversampling technique (SMOTE) to initially augment the original data to a more substantial size for improving the subsequent GAN training with a Wasserstein conditional generative adversarial network with gradient penalty (WCGAN-GP), proven for its state-of-art performance and enhanced stability. The ultimate objective of this research was to demonstrate that the quality of synthetic tabular data generated by the final WCGAN-GP model maintains the structural integrity and statistical representation of the original small dataset using this hybrid approach. This focus is particularly relevant for clinical trials, where limited data availability due to privacy concerns and restricted accessibility to subject enrollment pose common challenges. Despite the limitation of data, the findings demonstrate that the hybrid approach successfully generates synthetic data that closely preserved the characteristics of the original small dataset. By harnessing the power of this hybrid approach to generate faithful synthetic data, the potential for enhancing data-driven research in drug clinical trials become evident. This includes enabling a robust analysis on small datasets, supplementing the lack of clinical trial data, facilitating its utility in machine learning tasks, even extending to using the model for anomaly detection to ensure better quality control during clinical trial data collection, all while prioritizing data privacy and implementing strict data protection measures.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Atomic Data and Nuclear Data Tables
Atomic Data and Nuclear Data Tables 物理-物理:核物理
CiteScore
4.50
自引率
11.10%
发文量
27
审稿时长
47 days
期刊介绍: Atomic Data and Nuclear Data Tables presents compilations of experimental and theoretical information in atomic physics, nuclear physics, and closely related fields. The journal is devoted to the publication of tables and graphs of general usefulness to researchers in both basic and applied areas. Extensive ... click here for full Aims & Scope Atomic Data and Nuclear Data Tables presents compilations of experimental and theoretical information in atomic physics, nuclear physics, and closely related fields. The journal is devoted to the publication of tables and graphs of general usefulness to researchers in both basic and applied areas. Extensive and comprehensive compilations of experimental and theoretical results are featured.
期刊最新文献
Editorial Board Subshell gaps and onsets of collectivity from proton and neutron pairing gap correlations Matrix elements for spin-orbit couplings in KRb Fine structure transitions with spectral features in Fe V and Fe VI Editorial Board
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1