{"title":"表格数据生成模型的挑战与机遇","authors":"","doi":"10.1016/j.asoc.2024.112223","DOIUrl":null,"url":null,"abstract":"<div><p>Tabular data, organized like tables with rows and columns, is widely used. Existing models for tabular data synthesis often face limitations related to data size or complexity. In contrast, deep generative models, a part of deep learning, demonstrate proficiency in handling large and complex data sets. While these models have shown remarkable success in generating image and audio data, their application in tabular data synthesis is relatively new, lacking a comprehensive comparison with existing methods. To fill this gap, this study aims to systematically evaluate and compare the performance of deep generative models with these existing methods for tabular data synthesis, while also investigating the efficacy of post-processing techniques. We aim to identify strengths and limitations and provide insights for future research and practical applications. Our study showed that the Synthetic Minority Oversampling Technique (SMOTE) and its variants outperform deep generative models, especially for small datasets. However, we observed that an ensemble of deep generative models and post-generation processing performs better on large datasets than SMOTE alone. The results of our study indicate that deep generative models hold promise as a valuable tool for generating tabular data. Nonetheless, further research is warranted to enhance the performance of deep generative models and gain a comprehensive understanding of their limitations.</p></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":null,"pages":null},"PeriodicalIF":7.2000,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1568494624009979/pdfft?md5=e00f59792bf80ea537bbd2dd1fe3f155&pid=1-s2.0-S1568494624009979-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Challenges and opportunities of generative models on tabular data\",\"authors\":\"\",\"doi\":\"10.1016/j.asoc.2024.112223\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Tabular data, organized like tables with rows and columns, is widely used. Existing models for tabular data synthesis often face limitations related to data size or complexity. In contrast, deep generative models, a part of deep learning, demonstrate proficiency in handling large and complex data sets. While these models have shown remarkable success in generating image and audio data, their application in tabular data synthesis is relatively new, lacking a comprehensive comparison with existing methods. To fill this gap, this study aims to systematically evaluate and compare the performance of deep generative models with these existing methods for tabular data synthesis, while also investigating the efficacy of post-processing techniques. We aim to identify strengths and limitations and provide insights for future research and practical applications. Our study showed that the Synthetic Minority Oversampling Technique (SMOTE) and its variants outperform deep generative models, especially for small datasets. However, we observed that an ensemble of deep generative models and post-generation processing performs better on large datasets than SMOTE alone. The results of our study indicate that deep generative models hold promise as a valuable tool for generating tabular data. Nonetheless, further research is warranted to enhance the performance of deep generative models and gain a comprehensive understanding of their limitations.</p></div>\",\"PeriodicalId\":50737,\"journal\":{\"name\":\"Applied Soft Computing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":7.2000,\"publicationDate\":\"2024-09-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S1568494624009979/pdfft?md5=e00f59792bf80ea537bbd2dd1fe3f155&pid=1-s2.0-S1568494624009979-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Soft Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1568494624009979\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1568494624009979","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Challenges and opportunities of generative models on tabular data
Tabular data, organized like tables with rows and columns, is widely used. Existing models for tabular data synthesis often face limitations related to data size or complexity. In contrast, deep generative models, a part of deep learning, demonstrate proficiency in handling large and complex data sets. While these models have shown remarkable success in generating image and audio data, their application in tabular data synthesis is relatively new, lacking a comprehensive comparison with existing methods. To fill this gap, this study aims to systematically evaluate and compare the performance of deep generative models with these existing methods for tabular data synthesis, while also investigating the efficacy of post-processing techniques. We aim to identify strengths and limitations and provide insights for future research and practical applications. Our study showed that the Synthetic Minority Oversampling Technique (SMOTE) and its variants outperform deep generative models, especially for small datasets. However, we observed that an ensemble of deep generative models and post-generation processing performs better on large datasets than SMOTE alone. The results of our study indicate that deep generative models hold promise as a valuable tool for generating tabular data. Nonetheless, further research is warranted to enhance the performance of deep generative models and gain a comprehensive understanding of their limitations.
期刊介绍:
Applied Soft Computing is an international journal promoting an integrated view of soft computing to solve real life problems.The focus is to publish the highest quality research in application and convergence of the areas of Fuzzy Logic, Neural Networks, Evolutionary Computing, Rough Sets and other similar techniques to address real world complexities.
Applied Soft Computing is a rolling publication: articles are published as soon as the editor-in-chief has accepted them. Therefore, the web site will continuously be updated with new articles and the publication time will be short.