时间序列分类的稳健解释器推荐

IF 2.8 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Data Mining and Knowledge Discovery Pub Date : 2024-06-20 DOI:10.1007/s10618-024-01045-8

Thu Trang Nguyen, Thach Le Nguyen, Georgiana Ifrim

{"title":"时间序列分类的稳健解释器推荐","authors":"Thu Trang Nguyen, Thach Le Nguyen, Georgiana Ifrim","doi":"10.1007/s10618-024-01045-8","DOIUrl":null,"url":null,"abstract":"Time series classification is a task which deals with temporal sequences, a prevalent data type common in domains such as human activity recognition, sports analytics and general sensing. In this area, interest in explanability has been growing as explanation is key to understand the data and the model better. Recently, a great variety of techniques (e.g., LIME, SHAP, CAM) have been proposed and adapted for time series to provide explanation in the form of saliency maps, where the importance of each data point in the time series is quantified with a numerical value. However, the saliency maps can and often disagree, so it is unclear which one to use. This paper provides a novel framework to quantitatively evaluate and rank explanation methods for time series classification. We show how to robustly evaluate the informativeness of a given explanation method (i.e., relevance for the classification task), and how to compare explanations side-by-side. The goal is to recommend the best explainer for a given time series classification dataset. We propose AMEE, a Model-Agnostic Explanation Evaluation framework, for recommending saliency-based explanations for time series classification. In this approach, data perturbation is added to the input time series guided by each explanation. Our results show that perturbing discriminative parts of the time series leads to significant changes in classification accuracy, which can be used to evaluate each explanation. To be robust to different types of perturbations and different types of classifiers, we aggregate the accuracy loss across perturbations and classifiers. This novel approach allows us to recommend the best explainer among a set of different explainers, including random and oracle explainers. We provide a quantitative and qualitative analysis for synthetic datasets, a variety of time-series datasets, as well as a real-world case study with known expert ground truth.","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"44 1","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Robust explainer recommendation for time series classification\",\"authors\":\"Thu Trang Nguyen, Thach Le Nguyen, Georgiana Ifrim\",\"doi\":\"10.1007/s10618-024-01045-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Time series classification is a task which deals with temporal sequences, a prevalent data type common in domains such as human activity recognition, sports analytics and general sensing. In this area, interest in explanability has been growing as explanation is key to understand the data and the model better. Recently, a great variety of techniques (e.g., LIME, SHAP, CAM) have been proposed and adapted for time series to provide explanation in the form of saliency maps, where the importance of each data point in the time series is quantified with a numerical value. However, the saliency maps can and often disagree, so it is unclear which one to use. This paper provides a novel framework to quantitatively evaluate and rank explanation methods for time series classification. We show how to robustly evaluate the informativeness of a given explanation method (i.e., relevance for the classification task), and how to compare explanations side-by-side. The goal is to recommend the best explainer for a given time series classification dataset. We propose AMEE, a Model-Agnostic Explanation Evaluation framework, for recommending saliency-based explanations for time series classification. In this approach, data perturbation is added to the input time series guided by each explanation. Our results show that perturbing discriminative parts of the time series leads to significant changes in classification accuracy, which can be used to evaluate each explanation. To be robust to different types of perturbations and different types of classifiers, we aggregate the accuracy loss across perturbations and classifiers. This novel approach allows us to recommend the best explainer among a set of different explainers, including random and oracle explainers. We provide a quantitative and qualitative analysis for synthetic datasets, a variety of time-series datasets, as well as a real-world case study with known expert ground truth.\",\"PeriodicalId\":55183,\"journal\":{\"name\":\"Data Mining and Knowledge Discovery\",\"volume\":\"44 1\",\"pages\":\"\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2024-06-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Data Mining and Knowledge Discovery\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s10618-024-01045-8\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data Mining and Knowledge Discovery","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10618-024-01045-8","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

时间序列分类是一项处理时间序列的任务，是人类活动识别、体育分析和普通传感等领域常见的数据类型。在这一领域，人们对可解释性的兴趣与日俱增，因为解释是更好地理解数据和模型的关键。最近，有许多技术（如 LIME、SHAP、CAM）被提出并应用于时间序列，以显著性地图的形式提供解释。然而，显著性图可能而且经常会出现分歧，因此不清楚应该使用哪一个。本文提供了一个新颖的框架，用于对时间序列分类的解释方法进行量化评估和排序。我们展示了如何稳健地评估给定解释方法的信息量（即与分类任务的相关性），以及如何并排比较解释方法。我们的目标是为给定的时间序列分类数据集推荐最佳解释方法。我们提出了一个模型诊断解释评估框架 AMEE，用于为时间序列分类推荐基于显著性的解释。在这种方法中，数据扰动被添加到每个解释所引导的输入时间序列中。我们的研究结果表明，扰动时间序列的判别部分会导致分类准确率发生显著变化，而分类准确率可用于评估每种解释。为了适应不同类型的扰动和不同类型的分类器，我们汇总了不同扰动和分类器的准确率损失。通过这种新颖的方法，我们可以在一组不同的解释器（包括随机解释器和甲骨文解释器）中推荐最佳解释器。我们对合成数据集、各种时间序列数据集以及已知专家基本真相的真实世界案例研究进行了定量和定性分析。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Robust explainer recommendation for time series classification

Time series classification is a task which deals with temporal sequences, a prevalent data type common in domains such as human activity recognition, sports analytics and general sensing. In this area, interest in explanability has been growing as explanation is key to understand the data and the model better. Recently, a great variety of techniques (e.g., LIME, SHAP, CAM) have been proposed and adapted for time series to provide explanation in the form of saliency maps, where the importance of each data point in the time series is quantified with a numerical value. However, the saliency maps can and often disagree, so it is unclear which one to use. This paper provides a novel framework to quantitatively evaluate and rank explanation methods for time series classification. We show how to robustly evaluate the informativeness of a given explanation method (i.e., relevance for the classification task), and how to compare explanations side-by-side. The goal is to recommend the best explainer for a given time series classification dataset. We propose AMEE, a Model-Agnostic Explanation Evaluation framework, for recommending saliency-based explanations for time series classification. In this approach, data perturbation is added to the input time series guided by each explanation. Our results show that perturbing discriminative parts of the time series leads to significant changes in classification accuracy, which can be used to evaluate each explanation. To be robust to different types of perturbations and different types of classifiers, we aggregate the accuracy loss across perturbations and classifiers. This novel approach allows us to recommend the best explainer among a set of different explainers, including random and oracle explainers. We provide a quantitative and qualitative analysis for synthetic datasets, a variety of time-series datasets, as well as a real-world case study with known expert ground truth.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Data Mining and Knowledge Discovery 工程技术-计算机：人工智能

CiteScore

10.40

自引率

4.20%

发文量

审稿时长

10 months

期刊介绍： Advances in data gathering, storage, and distribution have created a need for computational tools and techniques to aid in data analysis. Data Mining and Knowledge Discovery in Databases (KDD) is a rapidly growing area of research and application that builds on techniques and theories from many fields, including statistics, databases, pattern recognition and learning, data visualization, uncertainty modelling, data warehousing and OLAP, optimization, and high performance computing.

期刊最新文献

Missing value replacement in strings and applications. FRUITS: feature extraction using iterated sums for time series classification Bounding the family-wise error rate in local causal discovery using Rademacher averages Evaluating the disclosure risk of anonymized documents via a machine learning-based re-identification attack Efficient learning with projected histograms