Rts: learning robustly from time series data with noisy label

IF 3.4 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Frontiers of Computer Science Pub Date : 2023-12-28 DOI:10.1007/s11704-023-3200-z

Zhi Zhou, Yi-Xuan Jin, Yu-Feng Li

{"title":"Rts: learning robustly from time series data with noisy label","authors":"Zhi Zhou, Yi-Xuan Jin, Yu-Feng Li","doi":"10.1007/s11704-023-3200-z","DOIUrl":null,"url":null,"abstract":"<p>Significant progress has been made in machine learning with large amounts of clean labels and static data. However, in many real-world applications, the data often changes with time and it is difficult to obtain massive clean annotations, that is, noisy labels and time series are faced simultaneously. For example, in product-buyer evaluation, each sample records the daily time behavior of users, but the long transaction period brings difficulties to analysis, and salespeople often erroneously annotate the user’s purchase behavior. Such a novel setting, to our best knowledge, has not been thoroughly studied yet, and there is still a lack of effective machine learning methods. In this paper, we present a systematic approach RTS both theoretically and empirically, consisting of two components, Noise-Tolerant Time Series Representation and Purified Oversampling Learning. Specifically, we propose reducing label noise’s destructive impact to obtain robust feature representations and potential clean samples. Then, a novel learning method based on the purified data and time series oversampling is adopted to train an unbiased model. Theoretical analysis proves that our proposal can improve the quality of the noisy data set. Empirical experiments on diverse tasks, such as the house-buyer evaluation task from real-world applications and various benchmark tasks, clearly demonstrate that our new algorithm robustly outperforms many competitive methods.</p>","PeriodicalId":12640,"journal":{"name":"Frontiers of Computer Science","volume":null,"pages":null},"PeriodicalIF":3.4000,"publicationDate":"2023-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers of Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11704-023-3200-z","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Significant progress has been made in machine learning with large amounts of clean labels and static data. However, in many real-world applications, the data often changes with time and it is difficult to obtain massive clean annotations, that is, noisy labels and time series are faced simultaneously. For example, in product-buyer evaluation, each sample records the daily time behavior of users, but the long transaction period brings difficulties to analysis, and salespeople often erroneously annotate the user’s purchase behavior. Such a novel setting, to our best knowledge, has not been thoroughly studied yet, and there is still a lack of effective machine learning methods. In this paper, we present a systematic approach RTS both theoretically and empirically, consisting of two components, Noise-Tolerant Time Series Representation and Purified Oversampling Learning. Specifically, we propose reducing label noise’s destructive impact to obtain robust feature representations and potential clean samples. Then, a novel learning method based on the purified data and time series oversampling is adopted to train an unbiased model. Theoretical analysis proves that our proposal can improve the quality of the noisy data set. Empirical experiments on diverse tasks, such as the house-buyer evaluation task from real-world applications and various benchmark tasks, clearly demonstrate that our new algorithm robustly outperforms many competitive methods.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Rts：从带有噪声标签的时间序列数据中稳健学习

利用大量干净的标签和静态数据进行机器学习已经取得了重大进展。然而，在现实世界的许多应用中，数据往往会随时间发生变化，很难获得大量干净的注释，即同时面临噪声标签和时间序列的问题。例如，在商品购买评价中，每个样本都记录了用户每天的时间行为，但交易周期较长，给分析带来了困难，而且销售人员经常错误地注释用户的购买行为。据我们所知，这样一种新颖的环境尚未得到深入研究，而且仍然缺乏有效的机器学习方法。在本文中，我们从理论和经验两方面提出了一种系统的 RTS 方法，它由两个部分组成：噪声容忍时间序列表示和纯化过采样学习。具体来说，我们建议减少标签噪声的破坏性影响，以获得稳健的特征表示和潜在的干净样本。然后，采用一种基于净化数据和时间序列超采样的新型学习方法来训练无偏模型。理论分析证明，我们的建议可以提高噪声数据集的质量。在各种任务（如实际应用中的房屋购买评估任务和各种基准任务）上的经验实验清楚地表明，我们的新算法稳健地优于许多竞争方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Frontiers of Computer Science COMPUTER SCIENCE, INFORMATION SYSTEMS-COMPUTER SCIENCE, SOFTWARE ENGINEERING

CiteScore

8.60

自引率

2.40%

发文量

799

审稿时长

6-12 weeks

期刊介绍： Frontiers of Computer Science aims to provide a forum for the publication of peer-reviewed papers to promote rapid communication and exchange between computer scientists. The journal publishes research papers and review articles in a wide range of topics, including: architecture, software, artificial intelligence, theoretical computer science, networks and communication, information systems, multimedia and graphics, information security, interdisciplinary, etc. The journal especially encourages papers from new emerging and multidisciplinary areas, as well as papers reflecting the international trends of research and development and on special topics reporting progress made by Chinese computer scientists.

期刊最新文献

A comprehensive survey of federated transfer learning: challenges, methods and applications DMFVAE: miRNA-disease associations prediction based on deep matrix factorization method with variational autoencoder Graph foundation model ABLkit: a Python toolkit for abductive learning SEOE: an option graph based semantically embedding method for prenatal depression detection