Anomaly Detection in Cloud Computing using Knowledge Graph Embedding and Machine Learning Mechanisms

IF 2.9 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Journal of Grid Computing Pub Date : 2023-12-29 DOI:10.1007/s10723-023-09727-1

Katerina Mitropoulou, Panagiotis Kokkinos, Polyzois Soumplis, Emmanouel Varvarigos

{"title":"Anomaly Detection in Cloud Computing using Knowledge Graph Embedding and Machine Learning Mechanisms","authors":"Katerina Mitropoulou, Panagiotis Kokkinos, Polyzois Soumplis, Emmanouel Varvarigos","doi":"10.1007/s10723-023-09727-1","DOIUrl":null,"url":null,"abstract":"<p>The orchestration of cloud computing infrastructures is challenging, considering the number, heterogeneity and dynamicity of the involved resources, along with the highly distributed nature of the applications that use them for computation and storage. Evidently, the volume of relevant monitoring data can be significant, and the ability to collect, analyze, and act on this data in real time is critical for the infrastructure’s efficient use. In this study, we introduce a novel methodology that adeptly manages the diverse, dynamic, and voluminous nature of cloud resources and the applications that they support. We use knowledge graphs to represent computing and storage resources and illustrate the relationships between them and the applications that utilize them. We then train GraphSAGE to acquire vector-based representations of the infrastructures’ properties, while preserving the structural properties of the graph. These are efficiently provided as input to two unsupervised machine learning algorithms, namely CBLOF and Isolation Forest, for the detection of storage and computing overusage events, where CBLOF demonstrates better performance across all our evaluation metrics. Following the detection of such events, we have also developed appropriate re-optimization mechanisms that ensure the performance of the served applications. Evaluated in a simulated environment, our methods demonstrate a significant advancement in anomaly detection and infrastructure optimization. The results underscore the potential of this closed-loop operation in dynamically adapting to the evolving demands of cloud infrastructures. By integrating data representation and machine learning methods with proactive management strategies, this research contributes substantially to the field of cloud computing, offering a scalable, intelligent solution for modern cloud infrastructures.</p>","PeriodicalId":54817,"journal":{"name":"Journal of Grid Computing","volume":"81 1","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2023-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Grid Computing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10723-023-09727-1","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The orchestration of cloud computing infrastructures is challenging, considering the number, heterogeneity and dynamicity of the involved resources, along with the highly distributed nature of the applications that use them for computation and storage. Evidently, the volume of relevant monitoring data can be significant, and the ability to collect, analyze, and act on this data in real time is critical for the infrastructure’s efficient use. In this study, we introduce a novel methodology that adeptly manages the diverse, dynamic, and voluminous nature of cloud resources and the applications that they support. We use knowledge graphs to represent computing and storage resources and illustrate the relationships between them and the applications that utilize them. We then train GraphSAGE to acquire vector-based representations of the infrastructures’ properties, while preserving the structural properties of the graph. These are efficiently provided as input to two unsupervised machine learning algorithms, namely CBLOF and Isolation Forest, for the detection of storage and computing overusage events, where CBLOF demonstrates better performance across all our evaluation metrics. Following the detection of such events, we have also developed appropriate re-optimization mechanisms that ensure the performance of the served applications. Evaluated in a simulated environment, our methods demonstrate a significant advancement in anomaly detection and infrastructure optimization. The results underscore the potential of this closed-loop operation in dynamically adapting to the evolving demands of cloud infrastructures. By integrating data representation and machine learning methods with proactive management strategies, this research contributes substantially to the field of cloud computing, offering a scalable, intelligent solution for modern cloud infrastructures.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用知识图谱嵌入和机器学习机制进行云计算异常检测

考虑到相关资源的数量、异构性和动态性，以及使用这些资源进行计算和存储的应用程序的高度分布性，云计算基础设施的协调工作极具挑战性。显而易见，相关的监控数据量可能非常大，而实时收集、分析和处理这些数据的能力对于基础设施的高效利用至关重要。在本研究中，我们介绍了一种新颖的方法，该方法能有效管理云资源及其支持的应用程序的多样性、动态性和海量性。我们使用知识图谱来表示计算和存储资源，并说明它们与使用它们的应用程序之间的关系。然后，我们对 GraphSAGE 进行训练，以获取基于向量的基础设施属性表示，同时保留图的结构属性。这些信息被有效地作为输入提供给两种无监督机器学习算法，即 CBLOF 和 Isolation Forest，用于检测存储和计算过度使用事件。在检测到此类事件后，我们还开发了适当的重新优化机制，以确保所服务应用程序的性能。在模拟环境中进行的评估表明，我们的方法在异常检测和基础设施优化方面取得了重大进展。结果凸显了这种闭环操作在动态适应云基础设施不断变化的需求方面的潜力。通过将数据表示和机器学习方法与前瞻性管理策略相结合，这项研究为云计算领域做出了重大贡献，为现代云基础设施提供了可扩展的智能解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Grid Computing COMPUTER SCIENCE, INFORMATION SYSTEMS-COMPUTER SCIENCE, THEORY & METHODS

CiteScore

8.70

自引率

9.10%

发文量

审稿时长

>12 weeks

期刊介绍： Grid Computing is an emerging technology that enables large-scale resource sharing and coordinated problem solving within distributed, often loosely coordinated groups-what are sometimes termed "virtual organizations. By providing scalable, secure, high-performance mechanisms for discovering and negotiating access to remote resources, Grid technologies promise to make it possible for scientific collaborations to share resources on an unprecedented scale, and for geographically distributed groups to work together in ways that were previously impossible. Similar technologies are being adopted within industry, where they serve as important building blocks for emerging service provider infrastructures. Even though the advantages of this technology for classes of applications have been acknowledged, research in a variety of disciplines, including not only multiple domains of computer science (networking, middleware, programming, algorithms) but also application disciplines themselves, as well as such areas as sociology and economics, is needed to broaden the applicability and scope of the current body of knowledge.