首页 > 最新文献

Annual ACM Workshop on Mining Network Data最新文献

英文 中文
Toward sophisticated detection with distributed triggers 使用分布式触发器进行复杂的检测
Pub Date : 2006-09-11 DOI: 10.1145/1162678.1162684
Ling Huang, M. Garofalakis, J. Hellerstein, A. Joseph, N. Taft
Recent research has proposed efficient protocols for distributed triggers, which can be used in monitoring infrastructures to maintain system-wide invariants and detect abnormal events with minimal communication overhead. To date, however, this work has been limited to simple thresholds on distributed aggregate functions like sums and counts. In this paper, we present our initial results that show how to use these simple threshold triggers to enable sophisticated anomaly detection in near-real time, with modest communication overheads. We design a distributed protocol to detect "unusual traffic patterns" buried in an Origin-Destination network flow matrix that: a) uses a Principal Components Analysis decomposition technique to detect anomalies via a threshold function on residual signals [10]; and b) efficiently tracks this threshold function in near-real time using a simple distributed protocol. In addition, we speculate that such simple thresholding can be a powerful tool for a variety of monitoring tasks beyond the one presented here, and we propose an agenda to explore additional sophisticated applications.
最近的研究提出了高效的分布式触发器协议,该协议可用于监控基础设施,以最小的通信开销维护系统范围的不变性并检测异常事件。然而,到目前为止,这项工作仅限于分布式聚合函数(如sum和counts)的简单阈值。在本文中,我们展示了我们的初步结果,展示了如何使用这些简单的阈值触发器在接近实时的情况下,以适度的通信开销实现复杂的异常检测。我们设计了一个分布式协议来检测隐藏在始发-目的地网络流矩阵中的“异常流量模式”,该协议:a)使用主成分分析分解技术通过残余信号的阈值函数检测异常[10];b)使用一个简单的分布式协议,在接近实时的情况下有效地跟踪这个阈值函数。此外,我们推测这种简单的阈值可以成为除本文介绍的之外的各种监视任务的强大工具,并且我们提出了探索其他复杂应用程序的议程。
{"title":"Toward sophisticated detection with distributed triggers","authors":"Ling Huang, M. Garofalakis, J. Hellerstein, A. Joseph, N. Taft","doi":"10.1145/1162678.1162684","DOIUrl":"https://doi.org/10.1145/1162678.1162684","url":null,"abstract":"Recent research has proposed efficient protocols for distributed triggers, which can be used in monitoring infrastructures to maintain system-wide invariants and detect abnormal events with minimal communication overhead. To date, however, this work has been limited to simple thresholds on distributed aggregate functions like sums and counts. In this paper, we present our initial results that show how to use these simple threshold triggers to enable sophisticated anomaly detection in near-real time, with modest communication overheads. We design a distributed protocol to detect \"unusual traffic patterns\" buried in an Origin-Destination network flow matrix that: a) uses a Principal Components Analysis decomposition technique to detect anomalies via a threshold function on residual signals [10]; and b) efficiently tracks this threshold function in near-real time using a simple distributed protocol. In addition, we speculate that such simple thresholding can be a powerful tool for a variety of monitoring tasks beyond the one presented here, and we propose an agenda to explore additional sophisticated applications.","PeriodicalId":216113,"journal":{"name":"Annual ACM Workshop on Mining Network Data","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127404768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
How to extract BGP peering information from the internet routing registry 如何从互联网路由注册表中提取BGP对等信息
Pub Date : 2006-09-11 DOI: 10.1145/1162678.1162685
G. Battista, Tiziana Refice, M. Rimondini
We describe an on-line service, and its underlying methodology, designed to extract BGP peerings from the Internet Routing Registry. Both the method and the service are based on: a consistency manager for integrating information across different registries, an RPSL analyzer that extracts peering specifications from RPSL objects, and a peering classifier that aims at understanding to what extent such peering specifications actually contribute to fully determine a peering. A peering graph is built with different levels of confidence. We compare the effectiveness of our method with the state of the art. The comparison puts in evidence the quality of the proposed method.
我们描述了一个在线服务,以及它的底层方法,旨在从互联网路由注册表中提取BGP对等。该方法和服务都基于:用于跨不同注册中心集成信息的一致性管理器,从RPSL对象提取对等规范的RPSL分析器,以及旨在理解此类对等规范在多大程度上有助于完全确定对等的对等分类器。建立了具有不同置信度的对等图。我们将我们方法的有效性与最先进的技术进行了比较。比较证明了所提出方法的质量。
{"title":"How to extract BGP peering information from the internet routing registry","authors":"G. Battista, Tiziana Refice, M. Rimondini","doi":"10.1145/1162678.1162685","DOIUrl":"https://doi.org/10.1145/1162678.1162685","url":null,"abstract":"We describe an on-line service, and its underlying methodology, designed to extract BGP peerings from the Internet Routing Registry. Both the method and the service are based on: a consistency manager for integrating information across different registries, an RPSL analyzer that extracts peering specifications from RPSL objects, and a peering classifier that aims at understanding to what extent such peering specifications actually contribute to fully determine a peering. A peering graph is built with different levels of confidence. We compare the effectiveness of our method with the state of the art. The comparison puts in evidence the quality of the proposed method.","PeriodicalId":216113,"journal":{"name":"Annual ACM Workshop on Mining Network Data","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121852075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Traffic classification using clustering algorithms 基于聚类算法的流量分类
Pub Date : 2006-09-11 DOI: 10.1145/1162678.1162679
Jeffrey Erman, M. Arlitt, A. Mahanti
Classification of network traffic using port-based or payload-based analysis is becoming increasingly difficult with many peer-to-peer (P2P) applications using dynamic port numbers, masquerading techniques, and encryption to avoid detection. An alternative approach is to classify traffic by exploiting the distinctive characteristics of applications when they communicate on a network. We pursue this latter approach and demonstrate how cluster analysis can be used to effectively identify groups of traffic that are similar using only transport layer statistics. Our work considers two unsupervised clustering algorithms, namely K-Means and DBSCAN, that have previously not been used for network traffic classification. We evaluate these two algorithms and compare them to the previously used AutoClass algorithm, using empirical Internet traces. The experimental results show that both K-Means and DBSCAN work very well and much more quickly then AutoClass. Our results indicate that although DBSCAN has lower accuracy compared to K-Means and AutoClass, DBSCAN produces better clusters.
使用基于端口或基于有效负载的分析对网络流量进行分类变得越来越困难,因为许多点对点(P2P)应用程序使用动态端口号、伪装技术和加密来避免检测。另一种方法是利用应用程序在网络上通信时的不同特征对流量进行分类。我们将采用后一种方法,并演示如何使用聚类分析来有效地识别仅使用传输层统计信息相似的流量组。我们的工作考虑了两种无监督聚类算法,即K-Means和DBSCAN,这两种算法以前没有用于网络流量分类。我们评估了这两种算法,并将它们与以前使用的AutoClass算法进行比较,使用经验的互联网痕迹。实验结果表明,K-Means算法和DBSCAN算法都比AutoClass算法运行速度快得多。我们的结果表明,尽管与K-Means和AutoClass相比,DBSCAN的准确率较低,但DBSCAN产生了更好的聚类。
{"title":"Traffic classification using clustering algorithms","authors":"Jeffrey Erman, M. Arlitt, A. Mahanti","doi":"10.1145/1162678.1162679","DOIUrl":"https://doi.org/10.1145/1162678.1162679","url":null,"abstract":"Classification of network traffic using port-based or payload-based analysis is becoming increasingly difficult with many peer-to-peer (P2P) applications using dynamic port numbers, masquerading techniques, and encryption to avoid detection. An alternative approach is to classify traffic by exploiting the distinctive characteristics of applications when they communicate on a network. We pursue this latter approach and demonstrate how cluster analysis can be used to effectively identify groups of traffic that are similar using only transport layer statistics. Our work considers two unsupervised clustering algorithms, namely K-Means and DBSCAN, that have previously not been used for network traffic classification. We evaluate these two algorithms and compare them to the previously used AutoClass algorithm, using empirical Internet traces. The experimental results show that both K-Means and DBSCAN work very well and much more quickly then AutoClass. Our results indicate that although DBSCAN has lower accuracy compared to K-Means and AutoClass, DBSCAN produces better clusters.","PeriodicalId":216113,"journal":{"name":"Annual ACM Workshop on Mining Network Data","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126124821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 766
Forensic analysis of autonomous system reachability 自主系统可达性的取证分析
Pub Date : 2006-09-11 DOI: 10.1145/1162678.1162688
D. K. Lee, S. Moon, T. Choi, T. Jeong
Security incidents have an adverse impact not only on end systems, but also on Internet routing, resulting in many out-of-reach prefixes. Previous work has looked at performance degradation in the data plane in terms of delay and loss. Also it has been reported that the number of routing updates increased significantly, which could be a reflection of increased routing instability in the control domain. In this paper, we perform a detailed forensic analysis of routing instability during known security incidents and present useful metrics in assessing damage in AS reachability. Any change in AS reachability is a direct indication of whether the AS had fallen victim to the security incident or not.We choose the Slammer worm attack in January, 2003, as a security incident for closer examination. For our forensic analysis, we use BGP routing data from RouteViews and RIPE. As a way to quantify AS reachability, we propose the following metrics: the prefix count and the address count. The number of unique prefixes in routing tables during the attack fluctuates greatly, but it does not represent the real scope of damage. We define the address count as the cardinality of the set of IP addresses an AS is responsible for either as an origin or transit AS, and observe how address counts changed over time. These two metrics together draw an accurate picture of how reachability to or through the AS had been affected. Though our analysis was done off-line, our methodology can be applied on-line and used in quick real-time assessment of AS reachability.
安全事件不仅会对终端系统产生不良影响,还会对Internet路由产生不良影响,造成许多不可及的前缀。以前的工作从延迟和丢失的角度研究了数据平面的性能下降。此外,据报道,路由更新的数量显著增加,这可能反映了控制域中路由不稳定性的增加。在本文中,我们对已知安全事件期间的路由不稳定性进行了详细的取证分析,并提出了评估AS可达性损害的有用指标。自治系统可达性的任何变化都直接表明自治系统是否成为安全事件的受害者。我们选择2003年1月的Slammer蠕虫攻击作为一个安全事件进行更深入的研究。对于我们的取证分析,我们使用来自RouteViews和RIPE的BGP路由数据。作为量化As可达性的一种方法,我们提出了以下指标:前缀计数和地址计数。在攻击过程中,路由表中唯一前缀的数量波动很大,但这并不能代表真正的破坏范围。我们将地址计数定义为一个自治系统作为源自治系统或传输自治系统负责的IP地址集的基数,并观察地址计数如何随时间变化。这两个指标合在一起可以准确地描述到AS的可达性或通过AS的可达性是如何受到影响的。虽然我们的分析是离线完成的,但我们的方法可以在线应用,并用于快速实时评估AS可达性。
{"title":"Forensic analysis of autonomous system reachability","authors":"D. K. Lee, S. Moon, T. Choi, T. Jeong","doi":"10.1145/1162678.1162688","DOIUrl":"https://doi.org/10.1145/1162678.1162688","url":null,"abstract":"Security incidents have an adverse impact not only on end systems, but also on Internet routing, resulting in many out-of-reach prefixes. Previous work has looked at performance degradation in the data plane in terms of delay and loss. Also it has been reported that the number of routing updates increased significantly, which could be a reflection of increased routing instability in the control domain. In this paper, we perform a detailed forensic analysis of routing instability during known security incidents and present useful metrics in assessing damage in AS reachability. Any change in AS reachability is a direct indication of whether the AS had fallen victim to the security incident or not.We choose the Slammer worm attack in January, 2003, as a security incident for closer examination. For our forensic analysis, we use BGP routing data from RouteViews and RIPE. As a way to quantify AS reachability, we propose the following metrics: the prefix count and the address count. The number of unique prefixes in routing tables during the attack fluctuates greatly, but it does not represent the real scope of damage. We define the address count as the cardinality of the set of IP addresses an AS is responsible for either as an origin or transit AS, and observe how address counts changed over time. These two metrics together draw an accurate picture of how reachability to or through the AS had been affected. Though our analysis was done off-line, our methodology can be applied on-line and used in quick real-time assessment of AS reachability.","PeriodicalId":216113,"journal":{"name":"Annual ACM Workshop on Mining Network Data","volume":"48 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130007820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Diagnosis of TCP overlay connection failures using bayesian networks 基于贝叶斯网络的TCP覆盖连接故障诊断
Pub Date : 2006-09-11 DOI: 10.1145/1162678.1162683
George J. Lee, L. Poole
When failures occur in Internet overlay connections today, it is difficult for users to determine the root cause of failure. An overlay connection may require TCP connections between a series of overlay nodes to succeed, but accurately determining which of these connections has failed is difficult for users without access to the internal workings of the overlay. Diagnosis using active probing is costly and may be inaccurate if probe packets are filtered or blocked. To address this problem, we develop a passive diagnosis approach that infers the most likely cause of failure using a Bayesian network modeling the conditional probability of TCP failures given the IP addresses of the hosts along the overlay path. We collect TCP failure data for 28.3 million TCP connections using data from the new Planetseer overlay monitoring system and train a Bayesian network for the diagnosis of overlay connection failures. We evaluate the accuracy of diagnosis using this Bayesian network on a set of overlay connections generated from observations of CoDeeN traffic patterns and find that our approach can accurately diagnose failures.
当互联网覆盖连接出现故障时,用户很难确定故障的根本原因。一个覆盖连接可能需要一系列覆盖节点之间的TCP连接才能成功,但是对于不访问覆盖内部工作的用户来说,准确地确定哪些连接失败是困难的。使用主动探测进行诊断不仅成本高,而且如果探测包被过滤或阻断,诊断结果可能不准确。为了解决这个问题,我们开发了一种被动诊断方法,该方法使用贝叶斯网络对TCP故障的条件概率进行建模,给出了覆盖路径上主机的IP地址,从而推断出最可能的故障原因。我们利用Planetseer覆盖监测系统的数据收集了2830万个TCP连接的故障数据,并训练了一个贝叶斯网络来诊断覆盖连接故障。我们使用该贝叶斯网络对一组由CoDeeN流量模式观察产生的覆盖连接进行了诊断的准确性评估,发现我们的方法可以准确地诊断故障。
{"title":"Diagnosis of TCP overlay connection failures using bayesian networks","authors":"George J. Lee, L. Poole","doi":"10.1145/1162678.1162683","DOIUrl":"https://doi.org/10.1145/1162678.1162683","url":null,"abstract":"When failures occur in Internet overlay connections today, it is difficult for users to determine the root cause of failure. An overlay connection may require TCP connections between a series of overlay nodes to succeed, but accurately determining which of these connections has failed is difficult for users without access to the internal workings of the overlay. Diagnosis using active probing is costly and may be inaccurate if probe packets are filtered or blocked. To address this problem, we develop a passive diagnosis approach that infers the most likely cause of failure using a Bayesian network modeling the conditional probability of TCP failures given the IP addresses of the hosts along the overlay path. We collect TCP failure data for 28.3 million TCP connections using data from the new Planetseer overlay monitoring system and train a Bayesian network for the diagnosis of overlay connection failures. We evaluate the accuracy of diagnosis using this Bayesian network on a set of overlay connections generated from observations of CoDeeN traffic patterns and find that our approach can accurately diagnose failures.","PeriodicalId":216113,"journal":{"name":"Annual ACM Workshop on Mining Network Data","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115972556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
SC2D: an alternative to trace anonymization SC2D:追踪匿名化的替代方案
Pub Date : 2006-09-11 DOI: 10.1145/1162678.1162686
J. Mogul, M. Arlitt
Progress in networking research depends crucially on applying novel analysis tools to real-world traces of network activity. This often conflicts with privacy and security requirements; many raw network traces include information that should never be revealed to others.The traditional resolution of this dilemma uses trace anonymization to remove secret information from traces, theoretically leaving enough information for research purposes while protecting privacy and security. However, trace anonymization can have both technical and non-technical drawbacks.We propose an alternative to trace-to-trace transformation that operates at a different level of abstraction. Since the ultimate goal is to transform raw traces into research results, we say: cut out the middle step. We propose a model for shipping flexible analysis code to the data, rather than vice versa. Our model aims to support independent, expert, prior review of analysis code. We propose a system design using layered abstraction to provide both ease of use, and ease of verification of privacy and security properties. The system would provide pre-approved modules for common analysis functions. We hope our approach could significantly increase the willingness of trace owners to share their data with researchers. We have loosely prototyped this approach in previously published research.
网络研究的进展关键取决于将新颖的分析工具应用于网络活动的真实痕迹。这通常与隐私和安全需求相冲突;许多原始网络痕迹包含了永远不应该透露给他人的信息。解决这一难题的传统方法是使用追踪匿名化从追踪中删除机密信息,理论上在保护隐私和安全的同时为研究目的留下足够的信息。然而,跟踪匿名化可能存在技术和非技术缺陷。我们提出了在不同抽象级别上操作的跟踪到跟踪转换的替代方案。由于最终目标是将原始痕迹转化为研究成果,我们说:删去中间步骤。我们提出了一个将灵活的分析代码传递给数据的模型,而不是相反。我们的模型旨在支持对分析代码进行独立的、专家的、事先的审查。我们提出了一种使用分层抽象的系统设计,以提供易于使用和易于验证的隐私和安全属性。该系统将为共同分析功能提供预先批准的模块。我们希望我们的方法可以显著提高追踪所有者与研究人员分享数据的意愿。我们在之前发表的研究中粗略地构建了这种方法的原型。
{"title":"SC2D: an alternative to trace anonymization","authors":"J. Mogul, M. Arlitt","doi":"10.1145/1162678.1162686","DOIUrl":"https://doi.org/10.1145/1162678.1162686","url":null,"abstract":"Progress in networking research depends crucially on applying novel analysis tools to real-world traces of network activity. This often conflicts with privacy and security requirements; many raw network traces include information that should never be revealed to others.The traditional resolution of this dilemma uses trace anonymization to remove secret information from traces, theoretically leaving enough information for research purposes while protecting privacy and security. However, trace anonymization can have both technical and non-technical drawbacks.We propose an alternative to trace-to-trace transformation that operates at a different level of abstraction. Since the ultimate goal is to transform raw traces into research results, we say: cut out the middle step. We propose a model for shipping flexible analysis code to the data, rather than vice versa. Our model aims to support independent, expert, prior review of analysis code. We propose a system design using layered abstraction to provide both ease of use, and ease of verification of privacy and security properties. The system would provide pre-approved modules for common analysis functions. We hope our approach could significantly increase the willingness of trace owners to share their data with researchers. We have loosely prototyped this approach in previously published research.","PeriodicalId":216113,"journal":{"name":"Annual ACM Workshop on Mining Network Data","volume":"133 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131324046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
SVM learning of IP address structure for latency prediction 支持向量机学习的IP地址结构延迟预测
Pub Date : 2006-09-11 DOI: 10.1145/1162678.1162682
Robert Beverly, K. Sollins, A. Berger
We examine the ability to exploit the hierarchical structure of Internet addresses in order to endow network agents with predictive capabilities. Specifically, we consider Support Vector Machines (SVMs) for prediction of round-trip latency to random network destinations the agent has not previously interacted with. We use kernel functions to transform the structured, yet fragmented and discontinuous, IP address space into a feature space amenable to SVMs. Our SVM approach is accurate, fast, suitable to on-line learning and generalizes well. SVM regression on a large, randomly collected data set of 30,000 Internet latencies yields a mean prediction error of 25ms using only 20% of the samples for training. Our results are promising for equipping end-nodes with intelligence for service selection, user-directed routing, resource scheduling and network inference. Finally, feature selection analysis finds that the eight most significant IP address bits provide surprisingly strong discriminative power.
我们研究了利用互联网地址的层次结构来赋予网络代理预测能力的能力。具体来说,我们考虑支持向量机(svm)来预测智能体以前没有与之交互的随机网络目的地的往返延迟。我们使用核函数将结构化的、碎片化的、不连续的IP地址空间转换为支持向量机的特征空间。该方法具有准确、快速、适合在线学习和泛化性好的特点。SVM回归在一个随机收集的30,000个互联网延迟的大型数据集上,只使用20%的样本进行训练,平均预测误差为25ms。我们的研究结果有望为终端节点提供服务选择、用户导向路由、资源调度和网络推理的智能。最后,特征选择分析发现,八个最重要的IP地址位具有惊人的强辨别能力。
{"title":"SVM learning of IP address structure for latency prediction","authors":"Robert Beverly, K. Sollins, A. Berger","doi":"10.1145/1162678.1162682","DOIUrl":"https://doi.org/10.1145/1162678.1162682","url":null,"abstract":"We examine the ability to exploit the hierarchical structure of Internet addresses in order to endow network agents with predictive capabilities. Specifically, we consider Support Vector Machines (SVMs) for prediction of round-trip latency to random network destinations the agent has not previously interacted with. We use kernel functions to transform the structured, yet fragmented and discontinuous, IP address space into a feature space amenable to SVMs. Our SVM approach is accurate, fast, suitable to on-line learning and generalizes well. SVM regression on a large, randomly collected data set of 30,000 Internet latencies yields a mean prediction error of 25ms using only 20% of the samples for training. Our results are promising for equipping end-nodes with intelligence for service selection, user-directed routing, resource scheduling and network inference. Finally, feature selection analysis finds that the eight most significant IP address bits provide surprisingly strong discriminative power.","PeriodicalId":216113,"journal":{"name":"Annual ACM Workshop on Mining Network Data","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116504861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
Mining web logs to debug distant connectivity problems 挖掘web日志以调试远程连接问题
Pub Date : 2006-09-11 DOI: 10.1145/1162678.1162680
Emre Kıcıman, D. Maltz, M. Goldszmidt, John C. Platt
Content providers base their business on their ability to receive and answer requests from clients distributed across the Internet. Since disruptions in the flow of these requests directly translate into lost revenue, there is tremendous incentive to diagnose why some requests fail and prod the responsible parties into corrective action. However, a content provider has only limited visibility into the state of the Internet outside its domain. Instead, it must mine failure diagnoses from available information sources to infer what is going wrong and who is responsible.Our ultimate goal is to help Internet content providers resolve reliability problems in the wide-area network that are affecting end-user perceived reliability. We describe two algorithms that represent our first steps towards enabling content providers to extract actionable debugging information from content provider logs, and we present the results of applying the algorithms to a week's worth of logs from a large content provider, during which time it handled over 1 billion requests originating from over 10 thousand ASes.
内容提供商的业务基于接收和回答分布在Internet上的客户机的请求的能力。由于这些请求流的中断直接转化为收入损失,因此有很大的动机来诊断某些请求失败的原因,并促使责任方采取纠正措施。然而,内容提供者对其域外的Internet状态的可见性有限。相反,它必须从可用的信息源中挖掘故障诊断,以推断出哪里出了问题以及谁应该对此负责。我们的最终目标是帮助互联网内容提供商解决广域网中影响最终用户感知可靠性的可靠性问题。我们描述了两种算法,它们代表了我们使内容提供者能够从内容提供者日志中提取可操作的调试信息的第一步,并且我们展示了将算法应用于来自大型内容提供者的一周日志的结果,在此期间,它处理了来自超过10,000个asa的超过10亿个请求。
{"title":"Mining web logs to debug distant connectivity problems","authors":"Emre Kıcıman, D. Maltz, M. Goldszmidt, John C. Platt","doi":"10.1145/1162678.1162680","DOIUrl":"https://doi.org/10.1145/1162678.1162680","url":null,"abstract":"Content providers base their business on their ability to receive and answer requests from clients distributed across the Internet. Since disruptions in the flow of these requests directly translate into lost revenue, there is tremendous incentive to diagnose why some requests fail and prod the responsible parties into corrective action. However, a content provider has only limited visibility into the state of the Internet outside its domain. Instead, it must mine failure diagnoses from available information sources to infer what is going wrong and who is responsible.Our ultimate goal is to help Internet content providers resolve reliability problems in the wide-area network that are affecting end-user perceived reliability. We describe two algorithms that represent our first steps towards enabling content providers to extract actionable debugging information from content provider logs, and we present the results of applying the algorithms to a week's worth of logs from a large content provider, during which time it handled over 1 billion requests originating from over 10 thousand ASes.","PeriodicalId":216113,"journal":{"name":"Annual ACM Workshop on Mining Network Data","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130681512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Privacy-preserving performance measurements 保护隐私的性能测量
Pub Date : 2006-09-11 DOI: 10.1145/1162678.1162687
M. Roughan, Yin Zhang
Internet performance is an issue of great interest, but it is not trivial to measure. A number of commercial companies try to measure this, as does RIPE, and many individual Internet Service Providers. However, all are hampered in their efforts by a fear of sharing such sensitive information. Customers make decision about "which provider" based on such measurements, and so service providers certainly do not want such data to be public (except in the case of the top provider), but at the same time, it is in everyones' interest to have good metrics in order to reduce the risk of large network problems, and to test the effect of proposed network improvements.This paper shows that it is possible to have your cake, and eat it too. Providers (and other interested parties) can make such measurements, and compute Internet-wide metrics securely in the knowledge that their private data is never shared, and so cannot be abused.
互联网性能是一个非常有趣的问题,但衡量它并不是微不足道的。许多商业公司、RIPE和许多个人互联网服务提供商都试图衡量这一点。然而,由于害怕分享如此敏感的信息,所有这些努力都受到了阻碍。客户根据这些测量来决定“哪个提供商”,因此服务提供商当然不希望这些数据公开(顶级提供商的情况除外),但与此同时,为了降低大型网络问题的风险,并测试提议的网络改进的效果,拥有良好的度量符合每个人的利益。这篇论文表明,鱼与熊掌是可能的。提供商(和其他相关方)可以进行这样的测量,并安全地计算互联网范围内的度量,因为他们的私有数据永远不会被共享,因此不会被滥用。
{"title":"Privacy-preserving performance measurements","authors":"M. Roughan, Yin Zhang","doi":"10.1145/1162678.1162687","DOIUrl":"https://doi.org/10.1145/1162678.1162687","url":null,"abstract":"Internet performance is an issue of great interest, but it is not trivial to measure. A number of commercial companies try to measure this, as does RIPE, and many individual Internet Service Providers. However, all are hampered in their efforts by a fear of sharing such sensitive information. Customers make decision about \"which provider\" based on such measurements, and so service providers certainly do not want such data to be public (except in the case of the top provider), but at the same time, it is in everyones' interest to have good metrics in order to reduce the risk of large network problems, and to test the effect of proposed network improvements.This paper shows that it is possible to have your cake, and eat it too. Providers (and other interested parties) can make such measurements, and compute Internet-wide metrics securely in the knowledge that their private data is never shared, and so cannot be abused.","PeriodicalId":216113,"journal":{"name":"Annual ACM Workshop on Mining Network Data","volume":"159 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127030029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Bayesian detection of router configuration anomalies 路由器配置异常的贝叶斯检测
Pub Date : 2005-08-22 DOI: 10.1145/1080173.1080190
Khalid El-Arini, Kevin S. Killourhy
Problems arising from router misconfigurations cost time and money. The first step in fixing such misconfigurations is finding them. In this paper, we propose a method for detecting misconfigurations that does not depend on an a priori model of what constitutes a correct configuration. Our hypothesis is that uncommon or unexpected misconfigurations in router data can be identified as statistical anomalies within a Bayesian framework. We present a detection algorithm based on this framework, and show that it is able to detect errors in the router configuration files of a university network.
路由器配置错误引起的问题耗费时间和金钱。修复这些错误配置的第一步是找到它们。在本文中,我们提出了一种检测错误配置的方法,该方法不依赖于构成正确配置的先验模型。我们的假设是,路由器数据中不常见或意外的错误配置可以被识别为贝叶斯框架内的统计异常。在此基础上提出了一种检测算法,并证明该算法能够检测出高校网络路由器配置文件中的错误。
{"title":"Bayesian detection of router configuration anomalies","authors":"Khalid El-Arini, Kevin S. Killourhy","doi":"10.1145/1080173.1080190","DOIUrl":"https://doi.org/10.1145/1080173.1080190","url":null,"abstract":"Problems arising from router misconfigurations cost time and money. The first step in fixing such misconfigurations is finding them. In this paper, we propose a method for detecting misconfigurations that does not depend on an a priori model of what constitutes a correct configuration. Our hypothesis is that uncommon or unexpected misconfigurations in router data can be identified as statistical anomalies within a Bayesian framework. We present a detection algorithm based on this framework, and show that it is able to detect errors in the router configuration files of a university network.","PeriodicalId":216113,"journal":{"name":"Annual ACM Workshop on Mining Network Data","volume":"172 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121048507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
期刊
Annual ACM Workshop on Mining Network Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1