Evaluation is key: a survey on evaluation measures for synthetic time series

IF 8.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Journal of Big Data Pub Date : 2024-05-07 DOI:10.1186/s40537-024-00924-7
Michael Stenger, Robert Leppich, Ian Foster, Samuel Kounev, André Bauer
{"title":"Evaluation is key: a survey on evaluation measures for synthetic time series","authors":"Michael Stenger, Robert Leppich, Ian Foster, Samuel Kounev, André Bauer","doi":"10.1186/s40537-024-00924-7","DOIUrl":null,"url":null,"abstract":"<p>Synthetic data generation describes the process of learning the underlying distribution of a given real dataset in a model, which is, in turn, sampled to produce new data objects still adhering to the original distribution. This approach often finds application where circumstances limit the availability or usability of real-world datasets, for instance, in health care due to privacy concerns. While image synthesis has received much attention in the past, time series are key for many practical (e.g., industrial) applications. To date, numerous different generative models and measures to evaluate time series syntheses have been proposed. However, regarding the defining features of high-quality synthetic time series and how to quantify quality, no consensus has yet been reached among researchers. Hence, we propose a comprehensive survey on evaluation measures for time series generation to assist users in evaluating synthetic time series. For one, we provide brief descriptions or - where applicable - precise definitions. Further, we order the measures in a taxonomy and examine applicability and usage. To assist in the selection of the most appropriate measures, we provide a concise guide for fast lookup. Notably, our findings reveal a lack of a universally accepted approach for an evaluation procedure, including the selection of appropriate measures. We believe this situation hinders progress and may even erode evaluation standards to a “do as you like”-approach to synthetic data evaluation. Therefore, this survey is a preliminary step to advance the field of synthetic data evaluation.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"28 1","pages":""},"PeriodicalIF":8.6000,"publicationDate":"2024-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Big Data","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1186/s40537-024-00924-7","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Synthetic data generation describes the process of learning the underlying distribution of a given real dataset in a model, which is, in turn, sampled to produce new data objects still adhering to the original distribution. This approach often finds application where circumstances limit the availability or usability of real-world datasets, for instance, in health care due to privacy concerns. While image synthesis has received much attention in the past, time series are key for many practical (e.g., industrial) applications. To date, numerous different generative models and measures to evaluate time series syntheses have been proposed. However, regarding the defining features of high-quality synthetic time series and how to quantify quality, no consensus has yet been reached among researchers. Hence, we propose a comprehensive survey on evaluation measures for time series generation to assist users in evaluating synthetic time series. For one, we provide brief descriptions or - where applicable - precise definitions. Further, we order the measures in a taxonomy and examine applicability and usage. To assist in the selection of the most appropriate measures, we provide a concise guide for fast lookup. Notably, our findings reveal a lack of a universally accepted approach for an evaluation procedure, including the selection of appropriate measures. We believe this situation hinders progress and may even erode evaluation standards to a “do as you like”-approach to synthetic data evaluation. Therefore, this survey is a preliminary step to advance the field of synthetic data evaluation.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
评估是关键:关于合成时间序列评估措施的调查
合成数据生成描述了在一个模型中学习给定真实数据集的基本分布的过程,反过来,该模型被采样以生成新的数据对象,这些新的数据对象仍然遵循原始分布。这种方法通常适用于现实世界数据集的可用性或可用性受到限制的情况,例如,出于隐私考虑,在医疗保健领域。图像合成在过去受到了广泛关注,而时间序列则是许多实际(如工业)应用的关键。迄今为止,已经提出了许多不同的生成模型和评估时间序列合成的方法。然而,对于高质量合成时间序列的定义特征以及如何量化质量,研究人员尚未达成共识。因此,我们建议对时间序列生成的评估措施进行全面调查,以帮助用户评估合成时间序列。首先,我们提供了简要说明或(如适用)精确定义。此外,我们还以分类法的形式对评估指标进行排序,并研究其适用性和使用情况。为了帮助选择最合适的测量方法,我们提供了快速查找的简明指南。值得注意的是,我们的研究结果表明,在评估程序(包括选择适当的衡量标准)方面缺乏普遍接受的方法。我们认为,这种情况会阻碍进展,甚至会削弱评估标准,使合成数据评估变成一种 "随心所欲 "的方法。因此,本次调查是推动合成数据评估领域发展的第一步。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Big Data
Journal of Big Data Computer Science-Information Systems
CiteScore
17.80
自引率
3.70%
发文量
105
审稿时长
13 weeks
期刊介绍: The Journal of Big Data publishes high-quality, scholarly research papers, methodologies, and case studies covering a broad spectrum of topics, from big data analytics to data-intensive computing and all applications of big data research. It addresses challenges facing big data today and in the future, including data capture and storage, search, sharing, analytics, technologies, visualization, architectures, data mining, machine learning, cloud computing, distributed systems, and scalable storage. The journal serves as a seminal source of innovative material for academic researchers and practitioners alike.
期刊最新文献
Shielding networks: enhancing intrusion detection with hybrid feature selection and stack ensemble learning Machine learning and deep learning models based grid search cross validation for short-term solar irradiance forecasting Optimizing poultry audio signal classification with deep learning and burn layer fusion Integrating microarray-based spatial transcriptomics and RNA-seq reveals tissue architecture in colorectal cancer A model for investment type recommender system based on the potential investors based on investors and experts feedback using ANFIS and MNN
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1