Identifying Changed or Sick Resources from Logs

A. Harutyunyan, A. Poghosyan, Naira Grigoryan, N. Kushmerick, Harutyun Beybutyan
{"title":"Identifying Changed or Sick Resources from Logs","authors":"A. Harutyunyan, A. Poghosyan, Naira Grigoryan, N. Kushmerick, Harutyun Beybutyan","doi":"10.1109/FAS-W.2018.00030","DOIUrl":null,"url":null,"abstract":"The identification of important changes in a complex distributed system is a challenging data science problem. Solving this problem is critical for tools for managing modern cloud infrastructure stacks and other large complex distributed systems. In this paper, we investigate two specific approaches to using log data to solve this problem. The first approach is comparing a source's current and past behavior. Some solutions that perform anomaly detection on numeric data from the data center are inevitably relying on global change point detection concepts. On the other hand, while log data promises a significantly different perspectives and dimensions to accomplish a similar task, state-of-the-art of solutions lack a capability to automatically detect significant change points in the log stream of an event source through learning its behavioral patterns. Such change points indicate the most important times when the source's behavior significantly differs from the past. A second complementary approach to real-time change detection involves comparing a source's current behavior with the current behavior of its peers in a population of sources serving a common role in the data center. Employing the concept of event types of log messages introduced earlier, we propose algorithms for each of these approaches that apply classical statistical and machine learning techniques to data capturing the distribution of those constructs. We demonstrate experimental results from our prototype algorithms.","PeriodicalId":164903,"journal":{"name":"2018 IEEE 3rd International Workshops on Foundations and Applications of Self* Systems (FAS*W)","volume":"301 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 3rd International Workshops on Foundations and Applications of Self* Systems (FAS*W)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FAS-W.2018.00030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

The identification of important changes in a complex distributed system is a challenging data science problem. Solving this problem is critical for tools for managing modern cloud infrastructure stacks and other large complex distributed systems. In this paper, we investigate two specific approaches to using log data to solve this problem. The first approach is comparing a source's current and past behavior. Some solutions that perform anomaly detection on numeric data from the data center are inevitably relying on global change point detection concepts. On the other hand, while log data promises a significantly different perspectives and dimensions to accomplish a similar task, state-of-the-art of solutions lack a capability to automatically detect significant change points in the log stream of an event source through learning its behavioral patterns. Such change points indicate the most important times when the source's behavior significantly differs from the past. A second complementary approach to real-time change detection involves comparing a source's current behavior with the current behavior of its peers in a population of sources serving a common role in the data center. Employing the concept of event types of log messages introduced earlier, we propose algorithms for each of these approaches that apply classical statistical and machine learning techniques to data capturing the distribution of those constructs. We demonstrate experimental results from our prototype algorithms.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
从日志中识别已更改或病态的资源
识别复杂分布式系统中的重要变化是一个具有挑战性的数据科学问题。解决这个问题对于管理现代云基础设施堆栈和其他大型复杂分布式系统的工具至关重要。在本文中,我们研究了利用测井数据解决这一问题的两种具体方法。第一种方法是比较源的当前和过去的行为。对来自数据中心的数字数据执行异常检测的一些解决方案不可避免地依赖于全局变化点检测概念。另一方面,虽然日志数据可以提供完全不同的视角和维度来完成类似的任务,但最先进的解决方案缺乏通过学习事件源的行为模式来自动检测事件源的日志流中的重要更改点的能力。这些变化点表明了震源的行为与过去显著不同的最重要时刻。实时变更检测的第二种补充方法涉及将源的当前行为与数据中心中服务于公共角色的源群中的对等源的当前行为进行比较。利用前面介绍的日志消息事件类型的概念,我们为这些方法中的每一种提出了算法,这些算法应用经典的统计和机器学习技术来捕获这些结构的分布。我们展示了我们的原型算法的实验结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Towards Self-Adaptive Systems with Hierarchical Decentralised Control DymGPU: Dynamic Memory Management for Sharing GPUs in Virtualized Clouds Reactive and Adaptive Security Monitoring in Cloud Computing Aspects of Measuring and Evaluating the Integration Status of a (Sub-)System at Runtime Efficient Classification of Application Characteristics by Using Hardware Performance Counters with Data Mining
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1