数据中心网络中用于交换机故障诊断和预测的Syslog处理

2017 IEEE/ACM 25th International Symposium on Quality of Service (IWQoS) Pub Date : 2017-06-14 DOI:10.1109/IWQoS.2017.7969130

Shenglin Zhang, Weibin Meng, Jiahao Bu, Sen Yang, Y. Liu, Dan Pei, Jun Xu, Yu Chen, Hui Dong, Xianping Qu, Lei Song

{"title":"数据中心网络中用于交换机故障诊断和预测的Syslog处理","authors":"Shenglin Zhang, Weibin Meng, Jiahao Bu, Sen Yang, Y. Liu, Dan Pei, Jun Xu, Yu Chen, Hui Dong, Xianping Qu, Lei Song","doi":"10.1109/IWQoS.2017.7969130","DOIUrl":null,"url":null,"abstract":"Syslogs on switches are a rich source of information for both post-mortem diagnosis and proactive prediction of switch failures in a datacenter network. However, such information can be effectively extracted only through proper processing of syslogs, e.g., using suitable machine learning techniques. A common approach to syslog processing is to extract (i.e., build) templates from historical syslog messages and then match syslog messages to these templates. However, existing template extraction techniques either have low accuracies in learning the “correct” set of templates, or does not support incremental learning in the sense the entire set of templates has to be rebuilt (from processing all historical syslog messages again) when a new template is to be added, which is prohibitively expensive computationally if used for a large datacenter network. To address these two problems, we propose a frequent template tree (FT-tree) model in which frequent combinations of (syslog) words are identified and then used as message templates. FT-tree empirically extracts message templates more accurately than existing approaches, and naturally supports incremental learning. To compare the performance of FT-tree and three other template learning techniques, we experimented them on two-years' worth of failure tickets and syslogs collected from switches deployed across 10+ datacenters of a tier-1 cloud service provider. The experiments demonstrated that FT-tree improved the estimation/prediction accuracy (as measured by F1) by 155% to 188%, and the computational efficiency by 117 to 730 times.","PeriodicalId":422861,"journal":{"name":"2017 IEEE/ACM 25th International Symposium on Quality of Service (IWQoS)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"69","resultStr":"{\"title\":\"Syslog processing for switch failure diagnosis and prediction in datacenter networks\",\"authors\":\"Shenglin Zhang, Weibin Meng, Jiahao Bu, Sen Yang, Y. Liu, Dan Pei, Jun Xu, Yu Chen, Hui Dong, Xianping Qu, Lei Song\",\"doi\":\"10.1109/IWQoS.2017.7969130\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Syslogs on switches are a rich source of information for both post-mortem diagnosis and proactive prediction of switch failures in a datacenter network. However, such information can be effectively extracted only through proper processing of syslogs, e.g., using suitable machine learning techniques. A common approach to syslog processing is to extract (i.e., build) templates from historical syslog messages and then match syslog messages to these templates. However, existing template extraction techniques either have low accuracies in learning the “correct” set of templates, or does not support incremental learning in the sense the entire set of templates has to be rebuilt (from processing all historical syslog messages again) when a new template is to be added, which is prohibitively expensive computationally if used for a large datacenter network. To address these two problems, we propose a frequent template tree (FT-tree) model in which frequent combinations of (syslog) words are identified and then used as message templates. FT-tree empirically extracts message templates more accurately than existing approaches, and naturally supports incremental learning. To compare the performance of FT-tree and three other template learning techniques, we experimented them on two-years' worth of failure tickets and syslogs collected from switches deployed across 10+ datacenters of a tier-1 cloud service provider. The experiments demonstrated that FT-tree improved the estimation/prediction accuracy (as measured by F1) by 155% to 188%, and the computational efficiency by 117 to 730 times.\",\"PeriodicalId\":422861,\"journal\":{\"name\":\"2017 IEEE/ACM 25th International Symposium on Quality of Service (IWQoS)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-06-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"69\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE/ACM 25th International Symposium on Quality of Service (IWQoS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IWQoS.2017.7969130\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE/ACM 25th International Symposium on Quality of Service (IWQoS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IWQoS.2017.7969130","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 69

摘要

交换机上的syslog日志是数据中心网络中交换机故障事后诊断和主动预测的丰富信息源。然而，这些信息只能通过适当的syslog处理来有效地提取，例如，使用合适的机器学习技术。syslog处理的一种常用方法是从历史syslog消息中提取(即构建)模板，然后将syslog消息与这些模板进行匹配。然而，现有的模板提取技术要么在学习“正确的”模板集方面精度较低，要么不支持增量学习，因为当要添加新模板时，必须重新构建整个模板集(再次处理所有历史syslog消息)，如果用于大型数据中心网络，这在计算上是非常昂贵的。为了解决这两个问题，我们提出了一个频繁模板树(FT-tree)模型，在该模型中，(syslog)单词的频繁组合被识别出来，然后用作消息模板。FT-tree在经验上比现有方法更准确地提取消息模板，并且自然支持增量学习。为了比较FT-tree和其他三种模板学习技术的性能，我们对从部署在一级云服务提供商的10多个数据中心的交换机上收集的两年的故障票据和syslog日志进行了实验。实验表明，FT-tree将估计/预测精度(以F1衡量)提高了155% ~ 188%，计算效率提高了117 ~ 730倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Syslog processing for switch failure diagnosis and prediction in datacenter networks

Syslogs on switches are a rich source of information for both post-mortem diagnosis and proactive prediction of switch failures in a datacenter network. However, such information can be effectively extracted only through proper processing of syslogs, e.g., using suitable machine learning techniques. A common approach to syslog processing is to extract (i.e., build) templates from historical syslog messages and then match syslog messages to these templates. However, existing template extraction techniques either have low accuracies in learning the “correct” set of templates, or does not support incremental learning in the sense the entire set of templates has to be rebuilt (from processing all historical syslog messages again) when a new template is to be added, which is prohibitively expensive computationally if used for a large datacenter network. To address these two problems, we propose a frequent template tree (FT-tree) model in which frequent combinations of (syslog) words are identified and then used as message templates. FT-tree empirically extracts message templates more accurately than existing approaches, and naturally supports incremental learning. To compare the performance of FT-tree and three other template learning techniques, we experimented them on two-years' worth of failure tickets and syslogs collected from switches deployed across 10+ datacenters of a tier-1 cloud service provider. The experiments demonstrated that FT-tree improved the estimation/prediction accuracy (as measured by F1) by 155% to 188%, and the computational efficiency by 117 to 730 times.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 IEEE/ACM 25th International Symposium on Quality of Service (IWQoS)

自引率

0.00%

发文量