Metric forensics: a multi-level approach for mining volatile graphs

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2010-07-25 DOI:10.1145/1835804.1835828

Keith W. Henderson, Tina Eliassi-Rad, C. Faloutsos, L. Akoglu, Lei Li, Koji Maruhashi, B. Prakash, Hanghang Tong

{"title":"Metric forensics: a multi-level approach for mining volatile graphs","authors":"Keith W. Henderson, Tina Eliassi-Rad, C. Faloutsos, L. Akoglu, Lei Li, Koji Maruhashi, B. Prakash, Hanghang Tong","doi":"10.1145/1835804.1835828","DOIUrl":null,"url":null,"abstract":"Advances in data collection and storage capacity have made it increasingly possible to collect highly volatile graph data for analysis. Existing graph analysis techniques are not appropriate for such data, especially in cases where streaming or near-real-time results are required. An example that has drawn significant research interest is the cyber-security domain, where internet communication traces are collected and real-time discovery of events, behaviors, patterns, and anomalies is desired. We propose MetricForensics, a scalable framework for analysis of volatile graphs. MetricForensics combines a multi-level \"drill down\" approach, a collection of user-selected graph metrics, and a collection of analysis techniques. At each successive level, more sophisticated metrics are computed and the graph is viewed at finer temporal resolutions. In this way, MetricForensics scales to highly volatile graphs by only allocating resources for computationally expensive analysis when an interesting event is discovered at a coarser resolution first. We test MetricForensics on three real-world graphs: an enterprise IP trace, a trace of legitimate and malicious network traffic from a research institution, and the MIT Reality Mining proximity sensor data. Our largest graph has 3M vertices and 32M edges, spanning 4.5 days. The results demonstrate the scalability and capability of MetricForensics in analyzing volatile graphs; and highlight four novel phenomena in such graphs: elbows, broken correlations, prolonged spikes, and lightweight stars.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"71","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1835804.1835828","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 71

Abstract

Advances in data collection and storage capacity have made it increasingly possible to collect highly volatile graph data for analysis. Existing graph analysis techniques are not appropriate for such data, especially in cases where streaming or near-real-time results are required. An example that has drawn significant research interest is the cyber-security domain, where internet communication traces are collected and real-time discovery of events, behaviors, patterns, and anomalies is desired. We propose MetricForensics, a scalable framework for analysis of volatile graphs. MetricForensics combines a multi-level "drill down" approach, a collection of user-selected graph metrics, and a collection of analysis techniques. At each successive level, more sophisticated metrics are computed and the graph is viewed at finer temporal resolutions. In this way, MetricForensics scales to highly volatile graphs by only allocating resources for computationally expensive analysis when an interesting event is discovered at a coarser resolution first. We test MetricForensics on three real-world graphs: an enterprise IP trace, a trace of legitimate and malicious network traffic from a research institution, and the MIT Reality Mining proximity sensor data. Our largest graph has 3M vertices and 32M edges, spanning 4.5 days. The results demonstrate the scalability and capability of MetricForensics in analyzing volatile graphs; and highlight four novel phenomena in such graphs: elbows, broken correlations, prolonged spikes, and lightweight stars.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

度量取证:用于挖掘易变图的多层次方法

数据收集和存储容量的进步使得越来越有可能收集高度易变的图形数据进行分析。现有的图形分析技术不适合这样的数据，特别是在需要流或近实时结果的情况下。引起重大研究兴趣的一个例子是网络安全领域，在该领域，收集互联网通信痕迹，并希望实时发现事件、行为、模式和异常。我们提出MetricForensics，一个可扩展的框架，用于分析易变图。MetricForensics结合了多层次的“向下钻取”方法、用户选择的图形指标集合和分析技术集合。在每个连续的级别上，计算更复杂的度量，并以更精细的时间分辨率查看图形。通过这种方式，当首先以较粗的分辨率发现感兴趣的事件时，MetricForensics仅为计算成本较高的分析分配资源，从而扩展到高度易变的图。我们在三个真实世界的图表上测试了MetricForensics:一个企业IP跟踪，一个来自研究机构的合法和恶意网络流量跟踪，以及麻省理工学院现实挖掘接近传感器数据。我们最大的图有3M个顶点和32M条边，跨度4.5天。结果证明了MetricForensics在分析易变图方面的可扩展性和能力;并在这些图表中强调四种新现象:肘部、破相关、长时间尖峰和轻恒星。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

自引率

0.00%

发文量

期刊最新文献

Frequent regular itemset mining Suggesting friends using the implicit social graph Collusion-resistant privacy-preserving data mining Mining advisor-advisee relationships from research publication networks Session details: Research track 5: classification models and tools