Advancing anomaly detection in computational workflows with active learning

IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Future Generation Computer Systems-The International Journal of Escience Pub Date : 2024-12-04 DOI:10.1016/j.future.2024.107608
Krishnan Raghavan , George Papadimitriou , Hongwei Jin , Anirban Mandal , Mariam Kiran , Prasanna Balaprakash , Ewa Deelman
{"title":"Advancing anomaly detection in computational workflows with active learning","authors":"Krishnan Raghavan ,&nbsp;George Papadimitriou ,&nbsp;Hongwei Jin ,&nbsp;Anirban Mandal ,&nbsp;Mariam Kiran ,&nbsp;Prasanna Balaprakash ,&nbsp;Ewa Deelman","doi":"10.1016/j.future.2024.107608","DOIUrl":null,"url":null,"abstract":"<div><div>A computational workflow, also known as workflow, consists of tasks that are executed in a certain order to attain a specific computational campaign. Computational workflows are commonly employed in science domains, such as physics, chemistry, genomics, to complete large-scale experiments in distributed and heterogeneous computing environments. However, running computations at such a large scale makes the workflow applications prone to failures and performance degradation, which can slowdown, stall, and ultimately lead to workflow failure. Learning how these workflows behave under normal and anomalous conditions can help us identify the causes of degraded performance and subsequently trigger appropriate actions to resolve them. However, learning in such circumstances is a challenging task because of the large volume of high-quality historical data needed to train accurate and reliable models. Generating such datasets not only takes a lot of time and effort but it also requires a lot of resources to be devoted to data generation for training purposes. Active learning is a promising approach to this problem. It is an approach where the data is generated as required by the machine learning model and thus it can potentially reduce the training data needed to derive accurate models. In this work, we present an active learning approach that is supported by an experimental framework, Poseidon-X, that utilizes a modern workflow management system and two cloud testbeds. We evaluate our approach using three computational workflows. For one workflow we run an end-to-end live active learning experiment, for the other two we evaluate our active learning algorithms using pre-captured data traces provided by the Flow-Bench benchmark. Our findings indicate that active learning not only saves resources, but it also improves the accuracy of the detection of anomalies.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"166 ","pages":"Article 107608"},"PeriodicalIF":6.2000,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X24005727","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

A computational workflow, also known as workflow, consists of tasks that are executed in a certain order to attain a specific computational campaign. Computational workflows are commonly employed in science domains, such as physics, chemistry, genomics, to complete large-scale experiments in distributed and heterogeneous computing environments. However, running computations at such a large scale makes the workflow applications prone to failures and performance degradation, which can slowdown, stall, and ultimately lead to workflow failure. Learning how these workflows behave under normal and anomalous conditions can help us identify the causes of degraded performance and subsequently trigger appropriate actions to resolve them. However, learning in such circumstances is a challenging task because of the large volume of high-quality historical data needed to train accurate and reliable models. Generating such datasets not only takes a lot of time and effort but it also requires a lot of resources to be devoted to data generation for training purposes. Active learning is a promising approach to this problem. It is an approach where the data is generated as required by the machine learning model and thus it can potentially reduce the training data needed to derive accurate models. In this work, we present an active learning approach that is supported by an experimental framework, Poseidon-X, that utilizes a modern workflow management system and two cloud testbeds. We evaluate our approach using three computational workflows. For one workflow we run an end-to-end live active learning experiment, for the other two we evaluate our active learning algorithms using pre-captured data traces provided by the Flow-Bench benchmark. Our findings indicate that active learning not only saves resources, but it also improves the accuracy of the detection of anomalies.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于主动学习的计算工作流异常检测
计算工作流,也称为工作流,由按一定顺序执行的任务组成,以达到特定的计算活动。计算工作流通常用于科学领域,如物理、化学、基因组学,在分布式和异构计算环境中完成大规模实验。然而,如此大规模地运行计算使工作流应用程序容易出现故障和性能下降,这可能会减慢速度、停滞,并最终导致工作流失败。了解这些工作流在正常和异常条件下的行为可以帮助我们确定性能下降的原因,并随后触发适当的操作来解决这些问题。然而,在这种情况下学习是一项具有挑战性的任务,因为需要大量高质量的历史数据来训练准确可靠的模型。生成这样的数据集不仅需要花费大量的时间和精力,而且还需要大量的资源用于训练目的的数据生成。主动学习是解决这个问题的一个很有前途的方法。这是一种根据机器学习模型的要求生成数据的方法,因此它可以潜在地减少导出准确模型所需的训练数据。在这项工作中,我们提出了一种主动学习方法,该方法由实验框架Poseidon-X支持,该框架利用现代工作流管理系统和两个云测试平台。我们使用三个计算工作流来评估我们的方法。对于一个工作流,我们运行端到端的实时主动学习实验,对于其他两个工作流,我们使用Flow-Bench基准提供的预捕获数据跟踪来评估我们的主动学习算法。我们的研究结果表明,主动学习不仅节省了资源,而且提高了异常检测的准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
19.90
自引率
2.70%
发文量
376
审稿时长
10.6 months
期刊介绍: Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications. Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration. Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.
期刊最新文献
STG-LAL: An Online Learnable Activation Spatio-Temporal Graph Network for End-to-End Traffic Congestion Forecasting Security, Resilience and Interoperability Perspectives on Digital Twin Ecosystems Fed3TO: An Efficient Semi-Asynchronous Federated Learning in Bandwidth Constrained Networks A Systematic Evaluation of the Potential of Carbon-Aware Execution for Scientific Workflows A Trusted Task Offloading Scheme Based on Cross-Area MEC-MEC Collaboration for Vehicular Network Load Balancing
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1