Fast, accurate and explainable time series classification through randomization

IF 4.3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Data Mining and Knowledge Discovery Pub Date : 2023-10-16 DOI:10.1007/s10618-023-00978-w

Nestor Cabello, Elham Naghizade, Jianzhong Qi, Lars Kulik

{"title":"Fast, accurate and explainable time series classification through randomization","authors":"Nestor Cabello, Elham Naghizade, Jianzhong Qi, Lars Kulik","doi":"10.1007/s10618-023-00978-w","DOIUrl":null,"url":null,"abstract":"Abstract Time series classification (TSC) aims to predict the class label of a given time series, which is critical to a rich set of application areas such as economics and medicine. State-of-the-art TSC methods have mostly focused on classification accuracy, without considering classification speed. However, efficiency is important for big data analysis. Datasets with a large training size or long series challenge the use of the current highly accurate methods, because they are usually computationally expensive. Similarly, classification explainability, which is an important property required by modern big data applications such as appliance modeling and legislation such as the European General Data Protection Regulation , has received little attention. To address these gaps, we propose a novel TSC method – the Randomized-Supervised Time Series Forest (r-STSF). r-STSF is extremely fast and achieves state-of-the-art classification accuracy. It is an efficient interval-based approach that classifies time series according to aggregate values of the discriminatory sub-series (intervals). To achieve state-of-the-art accuracy, r-STSF builds an ensemble of randomized trees using the discriminatory sub-series. It uses four time series representations, nine aggregation functions and a supervised binary-inspired search combined with a feature ranking metric to identify highly discriminatory sub-series. The discriminatory sub-series enable explainable classifications. Experiments on extensive datasets show that r-STSF achieves state-of-the-art accuracy while being orders of magnitude faster than most existing TSC methods and enabling for explanations on the classifier decision.","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"28 1","pages":"0"},"PeriodicalIF":4.3000,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data Mining and Knowledge Discovery","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s10618-023-00978-w","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 1

Abstract

Abstract Time series classification (TSC) aims to predict the class label of a given time series, which is critical to a rich set of application areas such as economics and medicine. State-of-the-art TSC methods have mostly focused on classification accuracy, without considering classification speed. However, efficiency is important for big data analysis. Datasets with a large training size or long series challenge the use of the current highly accurate methods, because they are usually computationally expensive. Similarly, classification explainability, which is an important property required by modern big data applications such as appliance modeling and legislation such as the European General Data Protection Regulation , has received little attention. To address these gaps, we propose a novel TSC method – the Randomized-Supervised Time Series Forest (r-STSF). r-STSF is extremely fast and achieves state-of-the-art classification accuracy. It is an efficient interval-based approach that classifies time series according to aggregate values of the discriminatory sub-series (intervals). To achieve state-of-the-art accuracy, r-STSF builds an ensemble of randomized trees using the discriminatory sub-series. It uses four time series representations, nine aggregation functions and a supervised binary-inspired search combined with a feature ranking metric to identify highly discriminatory sub-series. The discriminatory sub-series enable explainable classifications. Experiments on extensive datasets show that r-STSF achieves state-of-the-art accuracy while being orders of magnitude faster than most existing TSC methods and enabling for explanations on the classifier decision.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过随机化快速，准确和可解释的时间序列分类

时间序列分类(TSC)的目的是预测给定时间序列的类别标签，这对于经济学和医学等丰富的应用领域至关重要。现有的TSC方法主要关注分类精度，而不考虑分类速度。然而，对于大数据分析来说，效率是很重要的。具有大型训练规模或长序列的数据集挑战当前高精度方法的使用，因为它们通常在计算上昂贵。同样，分类可解释性是现代大数据应用(如设备建模)和立法(如欧洲通用数据保护条例)所要求的重要属性，但却很少受到关注。为了解决这些差距，我们提出了一种新的TSC方法-随机监督时间序列森林(r-STSF)。r-STSF非常快，达到了最先进的分类精度。它是一种有效的基于区间的方法，根据判别子序列(区间)的集合值对时间序列进行分类。为了达到最先进的精度，r-STSF使用歧视性子序列构建随机树的集合。它使用四种时间序列表示、九种聚合函数和一种监督二值搜索，结合特征排序度量来识别高度歧视的子序列。区分子系列使分类可以解释。在大量数据集上的实验表明，r-STSF达到了最先进的精度，同时比大多数现有的TSC方法快几个数量级，并且能够解释分类器的决策。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Data Mining and Knowledge Discovery 工程技术-计算机：人工智能

CiteScore

10.40

自引率

4.20%

发文量

审稿时长

10 months

期刊介绍： Advances in data gathering, storage, and distribution have created a need for computational tools and techniques to aid in data analysis. Data Mining and Knowledge Discovery in Databases (KDD) is a rapidly growing area of research and application that builds on techniques and theories from many fields, including statistics, databases, pattern recognition and learning, data visualization, uncertainty modelling, data warehousing and OLAP, optimization, and high performance computing.