Advancing the Art of Internet Edge Outage Detection

P. Richter, Ramakrishna Padmanabhan, N. Spring, A. Berger, D. Clark
{"title":"Advancing the Art of Internet Edge Outage Detection","authors":"P. Richter, Ramakrishna Padmanabhan, N. Spring, A. Berger, D. Clark","doi":"10.1145/3278532.3278563","DOIUrl":null,"url":null,"abstract":"Measuring reliability of edge networks in the Internet is difficult due to the size and heterogeneity of networks, the rarity of outages, and the difficulty of finding vantage points that can accurately capture such events at scale. In this paper, we use logs from a major CDN, detailing hourly request counts from address blocks. We discovered that in many edge address blocks, devices, collectively, contact the CDN every hour over weeks and months. We establish that a sudden temporary absence of these requests indicates a loss of Internet connectivity of those address blocks, events we call disruptions. We develop a disruption detection technique and present broad and detailed statistics on 1.5M disruption events over the course of a year. Our approach reveals that disruptions do not necessarily reflect actual service outages, but can be the result of prefix migrations. Major natural disasters are clearly represented in our data as expected; however, a large share of detected disruptions correlate well with planned human intervention during scheduled maintenance intervals, and are thus unlikely to be caused by external factors. Cross-evaluating our results we find that current state-of-the-art active outage detection over-estimates the occurrence of disruptions in some address blocks. Our observations of disruptions, service outages, and different causes for such events yield implications for the design of outage detection systems, as well as for policymakers seeking to establish reporting requirements for Internet services.","PeriodicalId":20640,"journal":{"name":"Proceedings of the Internet Measurement Conference 2018","volume":"18 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2018-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"37","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Internet Measurement Conference 2018","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3278532.3278563","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 37

Abstract

Measuring reliability of edge networks in the Internet is difficult due to the size and heterogeneity of networks, the rarity of outages, and the difficulty of finding vantage points that can accurately capture such events at scale. In this paper, we use logs from a major CDN, detailing hourly request counts from address blocks. We discovered that in many edge address blocks, devices, collectively, contact the CDN every hour over weeks and months. We establish that a sudden temporary absence of these requests indicates a loss of Internet connectivity of those address blocks, events we call disruptions. We develop a disruption detection technique and present broad and detailed statistics on 1.5M disruption events over the course of a year. Our approach reveals that disruptions do not necessarily reflect actual service outages, but can be the result of prefix migrations. Major natural disasters are clearly represented in our data as expected; however, a large share of detected disruptions correlate well with planned human intervention during scheduled maintenance intervals, and are thus unlikely to be caused by external factors. Cross-evaluating our results we find that current state-of-the-art active outage detection over-estimates the occurrence of disruptions in some address blocks. Our observations of disruptions, service outages, and different causes for such events yield implications for the design of outage detection systems, as well as for policymakers seeking to establish reporting requirements for Internet services.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
推进互联网边缘中断检测技术
由于网络的规模和异构性、中断的稀缺性以及寻找能够大规模准确捕获此类事件的有利位置的困难,测量互联网中边缘网络的可靠性是困难的。在本文中,我们使用来自主要CDN的日志,详细描述了来自地址块的每小时请求计数。我们发现,在许多边缘地址块中,设备在数周和数月中每小时都会联系CDN。我们确定,这些请求的突然暂时缺失表明这些地址块失去了互联网连接,我们称之为中断事件。我们开发了一种中断检测技术,并提供了一年中150万次中断事件的广泛而详细的统计数据。我们的方法表明,中断并不一定反映实际的服务中断,而可能是前缀迁移的结果。正如预期的那样,我们的数据清楚地反映了重大自然灾害;然而,大部分检测到的中断与计划维护间隔期间计划的人为干预相关,因此不太可能由外部因素引起。交叉评估我们的结果,我们发现当前最先进的主动中断检测高估了某些地址块中中断的发生。我们对中断、服务中断以及此类事件的不同原因的观察,为中断检测系统的设计以及寻求建立互联网服务报告要求的政策制定者提供了启示。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Reducing Permission Requests in Mobile Apps A Look at the ECS Behavior of DNS Resolvers RPKI is Coming of Age: A Longitudinal Study of RPKI Deployment and Invalid Route Origins Scanning the Scanners: Sensing the Internet from a Massively Distributed Network Telescope Learning Regexes to Extract Router Names from Hostnames
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1