When traffic anomalies or intrusion attempts occur on the network, we expect that the distribution of network traffic will change. Monitoring the network for changes over time, across space (at various routers in the network), over source and destination ports, IP addresses, or AS numbers, is an important part of anomaly detection. We present a manifold learning (ML)-based tool for the visualization of large sets of data which emphasizes the unusually small or large correlations that exist within the data set. We apply the tool to display anomalous traffic recorded by NetFlow on the Abilene backbone network. Furthermore, we present an online Java-based GUI which allows interactive demonstration of the use of the visualization method.
{"title":"Manifold learning visualization of network traffic data","authors":"Neal Patwari, A. Hero, Adam Pacholski","doi":"10.1145/1080173.1080182","DOIUrl":"https://doi.org/10.1145/1080173.1080182","url":null,"abstract":"When traffic anomalies or intrusion attempts occur on the network, we expect that the distribution of network traffic will change. Monitoring the network for changes over time, across space (at various routers in the network), over source and destination ports, IP addresses, or AS numbers, is an important part of anomaly detection. We present a manifold learning (ML)-based tool for the visualization of large sets of data which emphasizes the unusually small or large correlations that exist within the data set. We apply the tool to display anomalous traffic recorded by NetFlow on the Abilene backbone network. Furthermore, we present an online Java-based GUI which allows interactive demonstration of the use of the visualization method.","PeriodicalId":216113,"journal":{"name":"Annual ACM Workshop on Mining Network Data","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125768527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
An accurate mapping of traffic to applications is important for a broad range of network management and measurement tasks. Internet applications have traditionally been identified using well-known default server network-port numbers in the TCP or UDP headers. However this approach has become increasingly inaccurate. An alternate, more accurate technique is to use specific application-level features in the protocol exchange to guide the identification. Unfortunately deriving the signatures manually is very time consuming and difficult.In this paper, we explore automatically extracting application signatures from IP traffic payload content. In particular we apply three statistical machine learning algorithms to automatically identify signatures for a range of applications. The results indicate that this approach is highly accurate and scales to allow online application identification on high speed links. We also discovered that content signatures still work in the presence of encryption. In these cases we were able to derive content signature for unencrypted handshakes negotiating the encryption parameters of a particular connection.
{"title":"ACAS: automated construction of application signatures","authors":"P. Haffner, S. Sen, O. Spatscheck, Dongmei Wang","doi":"10.1145/1080173.1080183","DOIUrl":"https://doi.org/10.1145/1080173.1080183","url":null,"abstract":"An accurate mapping of traffic to applications is important for a broad range of network management and measurement tasks. Internet applications have traditionally been identified using well-known default server network-port numbers in the TCP or UDP headers. However this approach has become increasingly inaccurate. An alternate, more accurate technique is to use specific application-level features in the protocol exchange to guide the identification. Unfortunately deriving the signatures manually is very time consuming and difficult.In this paper, we explore automatically extracting application signatures from IP traffic payload content. In particular we apply three statistical machine learning algorithms to automatically identify signatures for a range of applications. The results indicate that this approach is highly accurate and scales to allow online application identification on high speed links. We also discovered that content signatures still work in the presence of encryption. In these cases we were able to derive content signature for unencrypted handshakes negotiating the encryption parameters of a particular connection.","PeriodicalId":216113,"journal":{"name":"Annual ACM Workshop on Mining Network Data","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115240431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Detecting anomalous BGP-route advertisements is crucial for improving the security and robustness of the Internet's interdomain-routing system. In this paper, we propose an instance-learning framework that identifies anomalies based on deviations from the "normal" BGP-update dynamics for a given destination prefix and across prefixes. We employ wavelets for a systematic, multi-scaled analysis that avoids the "magic numbers" (e.g., for grouping related update messages) needed in previous approaches to BGP-anomaly detection. Our preliminary results show that the update dynamics are generally consistent across prefixes and time. Only a few prefixes differ from the majority, and most prefixes exhibit similar behavior across time. This small set of abnormal prefixes and time intervals may be further examined to determine the source of anomalous behavior. In particular, we observe that many of the unusual prefixes are unstable prefixes that experience frequent routing changes.
{"title":"Learning-based anomaly detection in BGP updates","authors":"Jian Zhang, J. Rexford, J. Feigenbaum","doi":"10.1145/1080173.1080189","DOIUrl":"https://doi.org/10.1145/1080173.1080189","url":null,"abstract":"Detecting anomalous BGP-route advertisements is crucial for improving the security and robustness of the Internet's interdomain-routing system. In this paper, we propose an instance-learning framework that identifies anomalies based on deviations from the \"normal\" BGP-update dynamics for a given destination prefix and across prefixes. We employ wavelets for a systematic, multi-scaled analysis that avoids the \"magic numbers\" (e.g., for grouping related update messages) needed in previous approaches to BGP-anomaly detection. Our preliminary results show that the update dynamics are generally consistent across prefixes and time. Only a few prefixes differ from the majority, and most prefixes exhibit similar behavior across time. This small set of abnormal prefixes and time intervals may be further examined to determine the source of anomalous behavior. In particular, we observe that many of the unusual prefixes are unstable prefixes that experience frequent routing changes.","PeriodicalId":216113,"journal":{"name":"Annual ACM Workshop on Mining Network Data","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117104178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Traffic matrix (TM) can be used to detect, identify, and trace network anomaly caused by DDoS attacks and worm outbreaks. To detect network anomaly as early as possible, we need to obtain TM in a fast and accurate manner. Many existing TM estimation techniques are found not sufficient for this purpose due to their high overhead or low accuracy. We propose a cardinality-based TM measurement approach with an adaptive counting algorithm to produce both packetlevel and flow-level TM, which is well-suited for TM-based anomaly detection on a network basis. Our results show that the approach can obtain TM in almost real-time (once very 10 seconds) with low average relative error (less than 5%). Our approach has low processing, storage and communication overhead, e.g. software implementation can support OC-192 line speed. It can also be implemented in a passive mode and deployed incrementally without changing current routing infrastructure.
{"title":"Fast and accurate traffic matrix measurement using adaptive cardinality counting","authors":"M. Cai, Jianping Pan, Yu-Kwong Kwok, K. Hwang","doi":"10.1145/1080173.1080185","DOIUrl":"https://doi.org/10.1145/1080173.1080185","url":null,"abstract":"Traffic matrix (TM) can be used to detect, identify, and trace network anomaly caused by DDoS attacks and worm outbreaks. To detect network anomaly as early as possible, we need to obtain TM in a fast and accurate manner. Many existing TM estimation techniques are found not sufficient for this purpose due to their high overhead or low accuracy. We propose a cardinality-based TM measurement approach with an adaptive counting algorithm to produce both packetlevel and flow-level TM, which is well-suited for TM-based anomaly detection on a network basis. Our results show that the approach can obtain TM in almost real-time (once very 10 seconds) with low average relative error (less than 5%). Our approach has low processing, storage and communication overhead, e.g. software implementation can support OC-192 line speed. It can also be implemented in a passive mode and deployed incrementally without changing current routing infrastructure.","PeriodicalId":216113,"journal":{"name":"Annual ACM Workshop on Mining Network Data","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133755940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. Karamcheti, D. Geiger, Z. Kedem, S. Muthukrishnan
We study the problem of detecting malicious IP traffic in the network early, by analyzing the contents of packets. Existing systems look at packet contents as a bag of substrings and study characteristics of its base distribution B where B(i) is the frequency of substring i.We propose studying the inverse distribution I where I(f) is the number of substrings that appear with frequency f. As we show using a detailed case study, the inverse distribution shows the emergence of malicious traffic very clearly not only in its "static" collection of bumps, but also in its nascent "dynamic" state when the phenomenon manifests itself only as a distortion of the inverse distribution envelope. We describe our probabilistic analysis of the inverse distribution in terms of Gaussian mixtures, our preliminary solution for discovering these bumps automatically. Finally, we briefly discuss challenges in analyzing the inverse distribution of IP contents and its applications.
{"title":"Detecting malicious network traffic using inverse distributions of packet contents","authors":"V. Karamcheti, D. Geiger, Z. Kedem, S. Muthukrishnan","doi":"10.1145/1080173.1080176","DOIUrl":"https://doi.org/10.1145/1080173.1080176","url":null,"abstract":"We study the problem of detecting malicious IP traffic in the network early, by analyzing the contents of packets. Existing systems look at packet contents as a bag of substrings and study characteristics of its base distribution B where B(i) is the frequency of substring i.We propose studying the inverse distribution I where I(f) is the number of substrings that appear with frequency f. As we show using a detailed case study, the inverse distribution shows the emergence of malicious traffic very clearly not only in its \"static\" collection of bumps, but also in its nascent \"dynamic\" state when the phenomenon manifests itself only as a distortion of the inverse distribution envelope. We describe our probabilistic analysis of the inverse distribution in terms of Gaussian mixtures, our preliminary solution for discovering these bumps automatically. Finally, we briefly discuss challenges in analyzing the inverse distribution of IP contents and its applications.","PeriodicalId":216113,"journal":{"name":"Annual ACM Workshop on Mining Network Data","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128499267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alefiya Hussain, G. Bartlett, Yuri Pryadkin, J. Heidemann, C. Papadopoulos, J. Bannister
One of the most pressing problems in network research is the lack of long-term trace data from ISPs. The Internet carries an enormous volume and variety of data; mining this data can provide valuable insight into the design and development of new protocols and applications. Although capture cards for high-speed links exist today, actually making the network traffic available for analysis involves more than just getting the packets off the wire, but also handling large and variable traffic loads, sanitizing and anonymizing the data, and coordinating access by multiple users. In this paper we discuss the requirements, challenges, and design of an effective traffic monitoring infrastructure for network research. We describe our experience in deploying and maintaining a multi-user system for continuous trace collection at a large regional ISP@. We evaluate the performance of our system and show that it can support sustained collection and processing rates of over 160--300Mbits/s.
{"title":"Experiences with a continuous network tracing infrastructure","authors":"Alefiya Hussain, G. Bartlett, Yuri Pryadkin, J. Heidemann, C. Papadopoulos, J. Bannister","doi":"10.1145/1080173.1080181","DOIUrl":"https://doi.org/10.1145/1080173.1080181","url":null,"abstract":"One of the most pressing problems in network research is the lack of long-term trace data from ISPs. The Internet carries an enormous volume and variety of data; mining this data can provide valuable insight into the design and development of new protocols and applications. Although capture cards for high-speed links exist today, actually making the network traffic available for analysis involves more than just getting the packets off the wire, but also handling large and variable traffic loads, sanitizing and anonymizing the data, and coordinating access by multiple users. In this paper we discuss the requirements, challenges, and design of an effective traffic monitoring infrastructure for network research. We describe our experience in deploying and maintaining a multi-user system for continuous trace collection at a large regional ISP@. We evaluate the performance of our system and show that it can support sustained collection and processing rates of over 160--300Mbits/s.","PeriodicalId":216113,"journal":{"name":"Annual ACM Workshop on Mining Network Data","volume":"151 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127286853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this work, we show that machine learning, e.g., graphical models, plays an important role for the self-configuration of ad hoc wireless network. The role of such a learning approach includes a simple representation of complex dependencies in the network and a distributed algorithm which can adaptively find a nearly optimal configuration.
{"title":"Role of machine learning in configuration management of ad hoc wireless networks","authors":"Sung-eok Jeon, C. Ji","doi":"10.1145/1080173.1080191","DOIUrl":"https://doi.org/10.1145/1080173.1080191","url":null,"abstract":"In this work, we show that machine learning, e.g., graphical models, plays an important role for the self-configuration of ad hoc wireless network. The role of such a learning approach includes a simple representation of complex dependencies in the network and a distributed algorithm which can adaptively find a nearly optimal configuration.","PeriodicalId":216113,"journal":{"name":"Annual ACM Workshop on Mining Network Data","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126692194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Faults in an IP network have various causes such as the failure of one or more routers at the IP layer, fiber-cuts, failure of physical elements at the optical layer, or extraneous causes like power outages. These faults are usually detected as failures of a set of dependent logical entities--the IP links affected by the failed components. We present Shrink, a tool for root cause analysis of network faults which, given a set of failed IP links, identifies the underlying cause of the faulty state. Shrink models the diagnosis problem as a Bayesian network. It has two main contributions. First, it effectively accounts for noisy measurement and inaccurate mapping between the IP and optical layers. Second, it has an efficient inference algorithm that finds the most likely failure causes in polynomial time and with bounded errors. We compare Shrink with two prior approaches and show that it substantially improves the performance.
{"title":"Shrink: a tool for failure diagnosis in IP networks","authors":"Srikanth Kandula, D. Katabi, J. Vasseur","doi":"10.1145/1080173.1080178","DOIUrl":"https://doi.org/10.1145/1080173.1080178","url":null,"abstract":"Faults in an IP network have various causes such as the failure of one or more routers at the IP layer, fiber-cuts, failure of physical elements at the optical layer, or extraneous causes like power outages. These faults are usually detected as failures of a set of dependent logical entities--the IP links affected by the failed components. We present Shrink, a tool for root cause analysis of network faults which, given a set of failed IP links, identifies the underlying cause of the faulty state. Shrink models the diagnosis problem as a Bayesian network. It has two main contributions. First, it effectively accounts for noisy measurement and inaccurate mapping between the IP and optical layers. Second, it has an efficient inference algorithm that finds the most likely failure causes in polynomial time and with bounded errors. We compare Shrink with two prior approaches and show that it substantially improves the performance.","PeriodicalId":216113,"journal":{"name":"Annual ACM Workshop on Mining Network Data","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127939065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Enterprise networks contain hundreds, if not thousands, of cooperative end-systems. We advocate devoting a small fraction of their idle cycles, free disk space and network bandwidth to create Anemone, a platform for network management. In contrast to current approaches which rely on traffic statistics provided by network devices, Anemone combines end-system instrumentation with routing protocol collection to provide a semantically rich view of the network.
{"title":"Anemone: using end-systems as a rich network management platform","authors":"R. Mortier, R. Isaacs, P. Barham","doi":"10.1145/1080173.1080184","DOIUrl":"https://doi.org/10.1145/1080173.1080184","url":null,"abstract":"Enterprise networks contain hundreds, if not thousands, of cooperative end-systems. We advocate devoting a small fraction of their idle cycles, free disk space and network bandwidth to create Anemone, a platform for network management. In contrast to current approaches which rely on traffic statistics provided by network devices, Anemone combines end-system instrumentation with routing protocol collection to provide a semantically rich view of the network.","PeriodicalId":216113,"journal":{"name":"Annual ACM Workshop on Mining Network Data","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127750954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Darknets are often proposed to monitor for anomalous, externally sourced traffic, and require large, contiguous blocks of unused IP addresses - not always feasible for enterprise network operators. We introduce and evaluate the Greynet - a region of IP address space that is sparsely populated with 'darknet' addresses interspersed with active (or 'lit') IP addresses. Based on a small sample of traffic collected within a university campus network we saw that relatively sparse greynets can achieve useful levels of network scan detection.
{"title":"Greynets: a definition and evaluation of sparsely populated darknets","authors":"W. Harrop, G. Armitage","doi":"10.1145/1080173.1080177","DOIUrl":"https://doi.org/10.1145/1080173.1080177","url":null,"abstract":"Darknets are often proposed to monitor for anomalous, externally sourced traffic, and require large, contiguous blocks of unused IP addresses - not always feasible for enterprise network operators. We introduce and evaluate the Greynet - a region of IP address space that is sparsely populated with 'darknet' addresses interspersed with active (or 'lit') IP addresses. Based on a small sample of traffic collected within a university campus network we saw that relatively sparse greynets can achieve useful levels of network scan detection.","PeriodicalId":216113,"journal":{"name":"Annual ACM Workshop on Mining Network Data","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133247647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}