基于分层聚合结构的数据划分研究自回归递归网络的精度

IF 4.4 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Big Data and Cognitive Computing Pub Date : 2023-05-18 DOI:10.3390/bdcc7020100

J. Oliveira, Patrícia Ramos

{"title":"基于分层聚合结构的数据划分研究自回归递归网络的精度","authors":"J. Oliveira, Patrícia Ramos","doi":"10.3390/bdcc7020100","DOIUrl":null,"url":null,"abstract":"Global models have been developed to tackle the challenge of forecasting sets of series that are related or share similarities, but they have not been developed for heterogeneous datasets. Various methods of partitioning by relatedness have been introduced to enhance the similarities of sets, resulting in improved forecasting accuracy but often at the cost of a reduced sample size, which could be harmful. To shed light on how the relatedness between series impacts the effectiveness of global models in real-world demand-forecasting problems, we perform an extensive empirical study using the M5 competition dataset. We examine cross-learning scenarios driven by the product hierarchy commonly employed in retail planning to allow global models to capture interdependencies across products and regions more effectively. Our findings show that global models outperform state-of-the-art local benchmarks by a considerable margin, indicating that they are not inherently more limited than local models and can handle unrelated time-series data effectively. The accuracy of data-partitioning approaches increases as the sizes of the data pools and the models’ complexity decrease. However, there is a trade-off between data availability and data relatedness. Smaller data pools lead to increased similarity among time series, making it easier to capture cross-product and cross-region dependencies, but this comes at the cost of a reduced sample, which may not be beneficial. Finally, it is worth noting that the successful implementation of global models for heterogeneous datasets can significantly impact forecasting practice.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" ","pages":""},"PeriodicalIF":4.4000,"publicationDate":"2023-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Investigating the Accuracy of Autoregressive Recurrent Networks Using Hierarchical Aggregation Structure-Based Data Partitioning\",\"authors\":\"J. Oliveira, Patrícia Ramos\",\"doi\":\"10.3390/bdcc7020100\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Global models have been developed to tackle the challenge of forecasting sets of series that are related or share similarities, but they have not been developed for heterogeneous datasets. Various methods of partitioning by relatedness have been introduced to enhance the similarities of sets, resulting in improved forecasting accuracy but often at the cost of a reduced sample size, which could be harmful. To shed light on how the relatedness between series impacts the effectiveness of global models in real-world demand-forecasting problems, we perform an extensive empirical study using the M5 competition dataset. We examine cross-learning scenarios driven by the product hierarchy commonly employed in retail planning to allow global models to capture interdependencies across products and regions more effectively. Our findings show that global models outperform state-of-the-art local benchmarks by a considerable margin, indicating that they are not inherently more limited than local models and can handle unrelated time-series data effectively. The accuracy of data-partitioning approaches increases as the sizes of the data pools and the models’ complexity decrease. However, there is a trade-off between data availability and data relatedness. Smaller data pools lead to increased similarity among time series, making it easier to capture cross-product and cross-region dependencies, but this comes at the cost of a reduced sample, which may not be beneficial. Finally, it is worth noting that the successful implementation of global models for heterogeneous datasets can significantly impact forecasting practice.\",\"PeriodicalId\":36397,\"journal\":{\"name\":\"Big Data and Cognitive Computing\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2023-05-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Big Data and Cognitive Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/bdcc7020100\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Big Data and Cognitive Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/bdcc7020100","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

全球模型已经开发出来，以解决预测相关或具有相似性的系列集的挑战，但它们还没有为异构数据集开发。已经引入了各种通过相关性划分的方法来增强集合的相似性，从而提高预测准确性，但通常以减少样本量为代价，这可能是有害的。为了阐明系列之间的相关性如何影响全球模型在现实世界需求预测问题中的有效性，我们使用M5竞争数据集进行了广泛的实证研究。我们研究了由零售规划中常用的产品层次结构驱动的交叉学习场景，以允许全球模型更有效地捕获产品和地区之间的相互依赖性。我们的研究结果表明，全球模型在相当程度上优于最先进的本地基准，表明它们本身并不比本地模型更受限制，并且可以有效地处理不相关的时间序列数据。随着数据池规模的增大和模型复杂度的降低，数据划分方法的准确性也随之提高。然而，在数据可用性和数据相关性之间存在权衡。较小的数据池导致时间序列之间的相似性增加，从而更容易捕获跨产品和跨区域的依赖关系，但这是以减少样本为代价的，这可能不是有益的。最后，值得注意的是，异构数据集的全局模型的成功实施可以显著影响预测实践。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Investigating the Accuracy of Autoregressive Recurrent Networks Using Hierarchical Aggregation Structure-Based Data Partitioning

Global models have been developed to tackle the challenge of forecasting sets of series that are related or share similarities, but they have not been developed for heterogeneous datasets. Various methods of partitioning by relatedness have been introduced to enhance the similarities of sets, resulting in improved forecasting accuracy but often at the cost of a reduced sample size, which could be harmful. To shed light on how the relatedness between series impacts the effectiveness of global models in real-world demand-forecasting problems, we perform an extensive empirical study using the M5 competition dataset. We examine cross-learning scenarios driven by the product hierarchy commonly employed in retail planning to allow global models to capture interdependencies across products and regions more effectively. Our findings show that global models outperform state-of-the-art local benchmarks by a considerable margin, indicating that they are not inherently more limited than local models and can handle unrelated time-series data effectively. The accuracy of data-partitioning approaches increases as the sizes of the data pools and the models’ complexity decrease. However, there is a trade-off between data availability and data relatedness. Smaller data pools lead to increased similarity among time series, making it easier to capture cross-product and cross-region dependencies, but this comes at the cost of a reduced sample, which may not be beneficial. Finally, it is worth noting that the successful implementation of global models for heterogeneous datasets can significantly impact forecasting practice.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊