Scalable hybrid stream and hadoop network analysis system

V. Bumgardner, V. Marek
{"title":"Scalable hybrid stream and hadoop network analysis system","authors":"V. Bumgardner, V. Marek","doi":"10.1145/2568088.2568103","DOIUrl":null,"url":null,"abstract":"Collections of network traces have long been used in network traffic analysis. Flow analysis can be used in network anomaly discovery, intrusion detection and more generally, discovery of actionable events on the network. The data collected during processing may be also used for prediction and avoidance of traffic congestion, network capacity planning, and the development of software-defined networking rules. As network flow rates increase and new network technologies are introduced on existing hardware platforms, many organizations find themselves either technically or financially unable to generate, collect, and/or analyze network flow data. The continued rapid growth of network trace data, requires new methods of scalable data collection and analysis. We report on our deployment of a system designed and implemented at the University of Kentucky that supports analysis of network traffic across the enterprise. Our system addresses problems of scale in existing systems, by using distributed computing methodologies, and is based on a combination of stream and batch processing techniques. In addition to collection, stream processing using Storm is utilized to enrich the data stream with ephemeral environment data. Enriched stream-data is then used for event detection and near real-time flow analysis by an in-line complex event processor. Batch processing is performed by the Hadoop MapReduce framework, from data stored in HBase BigTable storage. In benchmarks on our 10 node cluster, using actual network data, we were able to stream process over 315k flows/sec. In batch analysis were we able to process over 2.6M flows/sec with a storage compression ratio of 6.7:1.","PeriodicalId":243233,"journal":{"name":"Proceedings of the 5th ACM/SPEC international conference on Performance engineering","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"27","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 5th ACM/SPEC international conference on Performance engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2568088.2568103","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 27

Abstract

Collections of network traces have long been used in network traffic analysis. Flow analysis can be used in network anomaly discovery, intrusion detection and more generally, discovery of actionable events on the network. The data collected during processing may be also used for prediction and avoidance of traffic congestion, network capacity planning, and the development of software-defined networking rules. As network flow rates increase and new network technologies are introduced on existing hardware platforms, many organizations find themselves either technically or financially unable to generate, collect, and/or analyze network flow data. The continued rapid growth of network trace data, requires new methods of scalable data collection and analysis. We report on our deployment of a system designed and implemented at the University of Kentucky that supports analysis of network traffic across the enterprise. Our system addresses problems of scale in existing systems, by using distributed computing methodologies, and is based on a combination of stream and batch processing techniques. In addition to collection, stream processing using Storm is utilized to enrich the data stream with ephemeral environment data. Enriched stream-data is then used for event detection and near real-time flow analysis by an in-line complex event processor. Batch processing is performed by the Hadoop MapReduce framework, from data stored in HBase BigTable storage. In benchmarks on our 10 node cluster, using actual network data, we were able to stream process over 315k flows/sec. In batch analysis were we able to process over 2.6M flows/sec with a storage compression ratio of 6.7:1.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
可扩展的混合流和hadoop网络分析系统
长期以来,网络轨迹集一直用于网络流量分析。流分析可用于网络异常发现、入侵检测以及更一般的网络上可操作事件的发现。处理过程中收集的数据还可用于预测和避免流量拥塞、规划网络容量、制定软件定义的网络规则等。随着网络流量的增加以及在现有硬件平台上引入新的网络技术,许多组织发现自己在技术上或经济上无法生成、收集和/或分析网络流量数据。网络跟踪数据的持续快速增长,需要新的可扩展的数据收集和分析方法。我们报告我们在肯塔基大学设计和实现的系统的部署情况,该系统支持对整个企业的网络流量进行分析。我们的系统解决了现有系统的规模问题,通过使用分布式计算方法,并基于流和批处理技术的组合。除了收集之外,使用Storm进行流处理还可以使用短暂的环境数据来丰富数据流。丰富的流数据,然后用于事件检测和近实时流分析由一个内联复杂事件处理器。批处理由Hadoop MapReduce框架对存储在HBase BigTable存储中的数据进行处理。在我们的10个节点集群的基准测试中,使用实际的网络数据,我们能够以超过315k流/秒的速度进行流处理。在批量分析中,我们能够处理超过2.6M流/秒,存储压缩比为6.7:1。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
The taming of the shrew: increasing performance by automatic parameter tuning for java garbage collectors Uncertainties in the modeling of self-adaptive systems: a taxonomy and an example of availability evaluation Scalable hybrid stream and hadoop network analysis system Efficient optimization of software performance models via parameter-space pruning Real-time multi-cloud management needs application awareness
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1