CAPRI:在大型日志数据中挖掘复杂线条模式的工具

F. Zulkernine, Patrick Martin, W. Powley, S. Soltani, Serge Mankovskii, Mark Addleman
{"title":"CAPRI:在大型日志数据中挖掘复杂线条模式的工具","authors":"F. Zulkernine, Patrick Martin, W. Powley, S. Soltani, Serge Mankovskii, Mark Addleman","doi":"10.1145/2501221.2501228","DOIUrl":null,"url":null,"abstract":"Log files provide important information for troubleshooting complex systems. However, the structure and contents of the log data and messages vary widely. For automated processing, it is necessary to first understand the layout and the structure of the data, which becomes very challenging when a massive amount of data and messages are reported by different system components in the same log file. Existing approaches apply supervised mining techniques and return frequent patterns only for single line messages. We present CAPRI (type-CAsted Pattern and Rule mIner), which uses a novel pattern mining algorithm to efficiently mine structural line patterns from semi-structured multi-line log messages. It discovers line patterns in a type-casted format; categorizes all data lines; identifies frequent, rare and interesting line patterns, and uses unsupervised learning and incremental mining techniques. It also mines association rules to identify the contextual relationship between two successive line patterns. In addition, CAPRI lists the frequent term and value patterns given the minimum support thresholds. The line and term pattern information can be applied in the next stage to categorize and reformat multi-line data, extract variables from the messages, and discover further correlation among messages for troubleshooting complex systems. To evaluate our approach, we present a comparative study of our tool against some of the existing popular open-source research tools using three different layouts of log data including a complex multi-line log file from the z/OS mainframe system.","PeriodicalId":441216,"journal":{"name":"BigMine '13","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"CAPRI: a tool for mining complex line patterns in large log data\",\"authors\":\"F. Zulkernine, Patrick Martin, W. Powley, S. Soltani, Serge Mankovskii, Mark Addleman\",\"doi\":\"10.1145/2501221.2501228\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Log files provide important information for troubleshooting complex systems. However, the structure and contents of the log data and messages vary widely. For automated processing, it is necessary to first understand the layout and the structure of the data, which becomes very challenging when a massive amount of data and messages are reported by different system components in the same log file. Existing approaches apply supervised mining techniques and return frequent patterns only for single line messages. We present CAPRI (type-CAsted Pattern and Rule mIner), which uses a novel pattern mining algorithm to efficiently mine structural line patterns from semi-structured multi-line log messages. It discovers line patterns in a type-casted format; categorizes all data lines; identifies frequent, rare and interesting line patterns, and uses unsupervised learning and incremental mining techniques. It also mines association rules to identify the contextual relationship between two successive line patterns. In addition, CAPRI lists the frequent term and value patterns given the minimum support thresholds. The line and term pattern information can be applied in the next stage to categorize and reformat multi-line data, extract variables from the messages, and discover further correlation among messages for troubleshooting complex systems. To evaluate our approach, we present a comparative study of our tool against some of the existing popular open-source research tools using three different layouts of log data including a complex multi-line log file from the z/OS mainframe system.\",\"PeriodicalId\":441216,\"journal\":{\"name\":\"BigMine '13\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-08-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BigMine '13\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2501221.2501228\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BigMine '13","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2501221.2501228","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

日志文件提供了排除复杂系统故障的重要信息。但是,日志数据和消息的结构和内容差别很大。对于自动化处理,有必要首先了解数据的布局和结构,当不同的系统组件在同一日志文件中报告大量数据和消息时,这变得非常具有挑战性。现有的方法采用监督挖掘技术,只返回单行消息的频繁模式。我们提出了CAPRI (type- cast Pattern and Rule mIner),它使用一种新颖的模式挖掘算法从半结构化的多行日志消息中有效地挖掘结构化的行模式。它发现类型转换格式的行模式;对所有数据线进行分类;识别频繁、罕见和有趣的线条模式,并使用无监督学习和增量挖掘技术。它还挖掘关联规则来识别两个连续的行模式之间的上下文关系。此外,CAPRI列出了给出最小支持阈值的常用术语和价值模式。可以在下一阶段应用行和项模式信息,对多行数据进行分类和重新格式化,从消息中提取变量,并进一步发现消息之间的相关性,以便对复杂系统进行故障排除。为了评估我们的方法,我们将我们的工具与现有的一些流行的开源研究工具进行了比较研究,使用三种不同的日志数据布局,包括来自z/OS大型机系统的复杂多行日志文件。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
CAPRI: a tool for mining complex line patterns in large log data
Log files provide important information for troubleshooting complex systems. However, the structure and contents of the log data and messages vary widely. For automated processing, it is necessary to first understand the layout and the structure of the data, which becomes very challenging when a massive amount of data and messages are reported by different system components in the same log file. Existing approaches apply supervised mining techniques and return frequent patterns only for single line messages. We present CAPRI (type-CAsted Pattern and Rule mIner), which uses a novel pattern mining algorithm to efficiently mine structural line patterns from semi-structured multi-line log messages. It discovers line patterns in a type-casted format; categorizes all data lines; identifies frequent, rare and interesting line patterns, and uses unsupervised learning and incremental mining techniques. It also mines association rules to identify the contextual relationship between two successive line patterns. In addition, CAPRI lists the frequent term and value patterns given the minimum support thresholds. The line and term pattern information can be applied in the next stage to categorize and reformat multi-line data, extract variables from the messages, and discover further correlation among messages for troubleshooting complex systems. To evaluate our approach, we present a comparative study of our tool against some of the existing popular open-source research tools using three different layouts of log data including a complex multi-line log file from the z/OS mainframe system.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Forecasting building occupancy using sensor network data Maintaining connected components for infinite graph streams Soft-CsGDT: soft cost-sensitive Gaussian decision tree for cost-sensitive classification of data streams Data-driven study of urban infrastructure to enable city-wide ubiquitous computing Big & personal: data and models behind netflix recommendations
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1