A Large Comparison of Normalization Methods on Time Series

IF 3.5 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Big Data Research Pub Date : 2023-08-22 DOI:10.1016/j.bdr.2023.100407

Felipe Tomazelli Lima, Vinicius M.A. Souza

{"title":"A Large Comparison of Normalization Methods on Time Series","authors":"Felipe Tomazelli Lima, Vinicius M.A. Souza","doi":"10.1016/j.bdr.2023.100407","DOIUrl":null,"url":null,"abstract":"<div>Normalization is a mandatory preprocessing step in time series problems to guarantee similarity comparisons invariant to unexpected distortions in amplitude and offset. Such distortions are usual for most time series data. A typical example is gait recognition by motion collected on subjects with varying body height and width. To rescale the data for the same range of values, the vast majority of researchers consider z-normalization as the default method for any domain application, data, or task. This choice is made without a searching process as occurs to set the parameters of an algorithm or without any experimental evidence in the literature considering a variety of scenarios to support this decision. To address this gap, we evaluate the impact of different normalization methods on time series data. Our analysis is based on an extensive experimental comparison on classification problems involving 10 normalization methods, 3 state-of-the-art classifiers, and 38 benchmark datasets. We consider the classification task due to the simplicity of the experimental settings and well-defined metrics. However, our findings can be extrapolated for other time series mining tasks, such as forecasting or clustering. Based on our results, we suggest to evaluate the maximum absolute scale as an alternative to z-normalization. Besides being time efficient, this alternative shows promising results for similarity-based methods using Euclidean distance. For deep learning, mean normalization could be considered.</div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"34 ","pages":"Article 100407"},"PeriodicalIF":3.5000,"publicationDate":"2023-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Big Data Research","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2214579623000400","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 1

Abstract

Normalization is a mandatory preprocessing step in time series problems to guarantee similarity comparisons invariant to unexpected distortions in amplitude and offset. Such distortions are usual for most time series data. A typical example is gait recognition by motion collected on subjects with varying body height and width. To rescale the data for the same range of values, the vast majority of researchers consider z-normalization as the default method for any domain application, data, or task. This choice is made without a searching process as occurs to set the parameters of an algorithm or without any experimental evidence in the literature considering a variety of scenarios to support this decision. To address this gap, we evaluate the impact of different normalization methods on time series data. Our analysis is based on an extensive experimental comparison on classification problems involving 10 normalization methods, 3 state-of-the-art classifiers, and 38 benchmark datasets. We consider the classification task due to the simplicity of the experimental settings and well-defined metrics. However, our findings can be extrapolated for other time series mining tasks, such as forecasting or clustering. Based on our results, we suggest to evaluate the maximum absolute scale as an alternative to z-normalization. Besides being time efficient, this alternative shows promising results for similarity-based methods using Euclidean distance. For deep learning, mean normalization could be considered.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

时间序列归一化方法的比较

归一化是时间序列问题中必不可少的预处理步骤，以保证相似性比较不受幅度和偏移的意外畸变的影响。这种扭曲对大多数时间序列数据来说是常见的。一个典型的例子是通过收集不同身高和宽度的受试者的运动来识别步态。为了在相同的值范围内重新调整数据，绝大多数研究人员认为z归一化是任何领域应用程序、数据或任务的默认方法。这种选择是在没有搜索过程的情况下做出的，就像设置算法的参数一样，或者在文献中没有考虑到各种场景来支持这一决定的任何实验证据。为了解决这一差距，我们评估了不同归一化方法对时间序列数据的影响。我们的分析基于对分类问题的广泛实验比较，涉及10种归一化方法、3种最先进的分类器和38个基准数据集。我们考虑的分类任务，由于简单的实验设置和良好定义的指标。然而，我们的发现可以外推到其他时间序列挖掘任务，如预测或聚类。根据我们的结果，我们建议评估最大绝对尺度作为z归一化的替代方案。除了时间效率外，这种替代方法在使用欧几里得距离的基于相似性的方法中显示出有希望的结果。对于深度学习，可以考虑均值归一化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Big Data Research Computer Science-Computer Science Applications

CiteScore

8.40

自引率

3.00%

发文量

期刊介绍： The journal aims to promote and communicate advances in big data research by providing a fast and high quality forum for researchers, practitioners and policy makers from the very many different communities working on, and with, this topic. The journal will accept papers on foundational aspects in dealing with big data, as well as papers on specific Platforms and Technologies used to deal with big data. To promote Data Science and interdisciplinary collaboration between fields, and to showcase the benefits of data driven research, papers demonstrating applications of big data in domains as diverse as Geoscience, Social Web, Finance, e-Commerce, Health Care, Environment and Climate, Physics and Astronomy, Chemistry, life sciences and drug discovery, digital libraries and scientific publications, security and government will also be considered. Occasionally the journal may publish whitepapers on policies, standards and best practices.

期刊最新文献

Modeling meaningful volatility events to classify monetary policy announcements Predicting option prices: From the Black-Scholes model to machine learning methods Editorial Board Efficient training: Federated learning cost analysis Improved Tesseract optical character recognition performance on Thai document datasets