{"title":"大型托管基础设施的自适应系统异常预测","authors":"Yongmin Tan, Xiaohui Gu, Haixun Wang","doi":"10.1145/1835698.1835741","DOIUrl":null,"url":null,"abstract":"Large-scale hosting infrastructures require automatic system anomaly management to achieve continuous system operation. In this paper, we present a novel adaptive runtime anomaly prediction system, called ALERT, to achieve robust hosting infrastructures. In contrast to traditional anomaly detection schemes, ALERT aims at raising advance anomaly alerts to achieve just-in-time anomaly prevention. We propose a novel context-aware anomaly prediction scheme to improve prediction accuracy in dynamic hosting infrastructures. We have implemented the ALERT system and deployed it on several production hosting infrastructures such as IBM System S stream processing cluster and PlanetLab. Our experiments show that ALERT can achieve high prediction accuracy for a range of system anomalies and impose low overhead to the hosting infrastructure.","PeriodicalId":447863,"journal":{"name":"Proceedings of the 29th ACM SIGACT-SIGOPS symposium on Principles of distributed computing","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"80","resultStr":"{\"title\":\"Adaptive system anomaly prediction for large-scale hosting infrastructures\",\"authors\":\"Yongmin Tan, Xiaohui Gu, Haixun Wang\",\"doi\":\"10.1145/1835698.1835741\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large-scale hosting infrastructures require automatic system anomaly management to achieve continuous system operation. In this paper, we present a novel adaptive runtime anomaly prediction system, called ALERT, to achieve robust hosting infrastructures. In contrast to traditional anomaly detection schemes, ALERT aims at raising advance anomaly alerts to achieve just-in-time anomaly prevention. We propose a novel context-aware anomaly prediction scheme to improve prediction accuracy in dynamic hosting infrastructures. We have implemented the ALERT system and deployed it on several production hosting infrastructures such as IBM System S stream processing cluster and PlanetLab. Our experiments show that ALERT can achieve high prediction accuracy for a range of system anomalies and impose low overhead to the hosting infrastructure.\",\"PeriodicalId\":447863,\"journal\":{\"name\":\"Proceedings of the 29th ACM SIGACT-SIGOPS symposium on Principles of distributed computing\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-07-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"80\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 29th ACM SIGACT-SIGOPS symposium on Principles of distributed computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1835698.1835741\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 29th ACM SIGACT-SIGOPS symposium on Principles of distributed computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1835698.1835741","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 80
摘要
大型托管基础设施需要对系统异常进行自动化管理,以实现系统的连续运行。在本文中,我们提出了一种新的自适应运行时异常预测系统,称为ALERT,以实现健壮的托管基础设施。与传统的异常检测方案相比,ALERT旨在提前提出异常警报,以实现及时的异常预防。为了提高动态托管基础设施的预测精度,提出了一种新的上下文感知异常预测方案。我们已经实现了ALERT系统,并将其部署在几个生产托管基础设施上,如IBM system S流处理集群和PlanetLab。我们的实验表明,ALERT可以对一系列系统异常实现较高的预测精度,并且对托管基础设施施加较低的开销。
Adaptive system anomaly prediction for large-scale hosting infrastructures
Large-scale hosting infrastructures require automatic system anomaly management to achieve continuous system operation. In this paper, we present a novel adaptive runtime anomaly prediction system, called ALERT, to achieve robust hosting infrastructures. In contrast to traditional anomaly detection schemes, ALERT aims at raising advance anomaly alerts to achieve just-in-time anomaly prevention. We propose a novel context-aware anomaly prediction scheme to improve prediction accuracy in dynamic hosting infrastructures. We have implemented the ALERT system and deployed it on several production hosting infrastructures such as IBM System S stream processing cluster and PlanetLab. Our experiments show that ALERT can achieve high prediction accuracy for a range of system anomalies and impose low overhead to the hosting infrastructure.