{"title":"将pcap转换为Weka可挖掘的数据","authors":"C. A. Fowler, R. Hammell","doi":"10.1109/SNPD.2014.6888681","DOIUrl":null,"url":null,"abstract":"In today's world there is an unprecedented volume of information available to organizations of all sizes; the “information overload” problem is well documented. This problem is especially challenging in the world of network intrusion detection. In this realm, we must not only deal with sifting through vast amounts of data, but we must also do it in a timely manner even when at times we are not sure what exactly it is we are trying to find. In the grander scheme of our work we intend to demonstrate that several different data mining algorithms reporting to an overarching layer will yield more accurate results than anyone data mining application (or algorithm) acting on its own. The system will operate in the domain of offline network and computer forensic data mining, under the guidance of a hybrid intelligence/multi-agent, systems based, for interpretation and interpolation of the findings. Toward that end, in this paper we build upon earlier work, undertaking the steps required for generating and preparing suitably minable data. Specifically, we are concerned with extracting as much useful data as possible out of a PCAP (Packet capture) for importing into Weka. While a PCAP may have thousands of field/value pairs, Wireshark and tshark's csv (comma separated value) output module only renders a small percentage of these fields and their values by default. We introduce a tool of our own making which enumerates every field (with or without a value) in any PCAP and generates an ARFF (Attribute-Relation File Format - Weka default). This code represents a component of a larger application we are designing (future work) which will ingest a PCAP, semi-autonomously preprocess it and feed it into Weka for processing/mining using several different algorithms.","PeriodicalId":272932,"journal":{"name":"15th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Converting PCAPs into Weka mineable data\",\"authors\":\"C. A. Fowler, R. Hammell\",\"doi\":\"10.1109/SNPD.2014.6888681\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In today's world there is an unprecedented volume of information available to organizations of all sizes; the “information overload” problem is well documented. This problem is especially challenging in the world of network intrusion detection. In this realm, we must not only deal with sifting through vast amounts of data, but we must also do it in a timely manner even when at times we are not sure what exactly it is we are trying to find. In the grander scheme of our work we intend to demonstrate that several different data mining algorithms reporting to an overarching layer will yield more accurate results than anyone data mining application (or algorithm) acting on its own. The system will operate in the domain of offline network and computer forensic data mining, under the guidance of a hybrid intelligence/multi-agent, systems based, for interpretation and interpolation of the findings. Toward that end, in this paper we build upon earlier work, undertaking the steps required for generating and preparing suitably minable data. Specifically, we are concerned with extracting as much useful data as possible out of a PCAP (Packet capture) for importing into Weka. While a PCAP may have thousands of field/value pairs, Wireshark and tshark's csv (comma separated value) output module only renders a small percentage of these fields and their values by default. We introduce a tool of our own making which enumerates every field (with or without a value) in any PCAP and generates an ARFF (Attribute-Relation File Format - Weka default). This code represents a component of a larger application we are designing (future work) which will ingest a PCAP, semi-autonomously preprocess it and feed it into Weka for processing/mining using several different algorithms.\",\"PeriodicalId\":272932,\"journal\":{\"name\":\"15th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD)\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"15th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SNPD.2014.6888681\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"15th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SNPD.2014.6888681","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

摘要

在当今世界,各种规模的组织都可以获得前所未有的信息量;“信息过载”的问题是有据可查的。这个问题在网络入侵检测领域尤其具有挑战性。在这个领域,我们不仅要处理筛选大量的数据,而且我们还必须及时地进行筛选,即使有时我们不确定我们要寻找的究竟是什么。在我们工作的宏伟计划中,我们打算证明,向一个总体层报告的几种不同的数据挖掘算法将产生比任何单独运行的数据挖掘应用程序(或算法)更准确的结果。该系统将在离线网络和计算机取证数据挖掘领域运行,在混合智能/多代理的指导下,以系统为基础,对调查结果进行解释和插值。为此,在本文中,我们以早期的工作为基础,采取生成和准备适当可挖掘数据所需的步骤。具体来说,我们关心的是从PCAP(数据包捕获)中提取尽可能多的有用数据,以便导入到Weka中。虽然PCAP可能有数千个字段/值对,但Wireshark和tshark的csv(逗号分隔值)输出模块在默认情况下只呈现这些字段及其值的一小部分。我们引入了一个自己制作的工具,它可以枚举任何PCAP中的每个字段(带值或不带值),并生成一个ARFF(属性关系文件格式- Weka默认)。这段代码代表了我们正在设计的一个更大的应用程序的一个组件(未来的工作),它将获取一个PCAP,对其进行半自动预处理,并将其提供给Weka,以便使用几种不同的算法进行处理/挖掘。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Converting PCAPs into Weka mineable data
In today's world there is an unprecedented volume of information available to organizations of all sizes; the “information overload” problem is well documented. This problem is especially challenging in the world of network intrusion detection. In this realm, we must not only deal with sifting through vast amounts of data, but we must also do it in a timely manner even when at times we are not sure what exactly it is we are trying to find. In the grander scheme of our work we intend to demonstrate that several different data mining algorithms reporting to an overarching layer will yield more accurate results than anyone data mining application (or algorithm) acting on its own. The system will operate in the domain of offline network and computer forensic data mining, under the guidance of a hybrid intelligence/multi-agent, systems based, for interpretation and interpolation of the findings. Toward that end, in this paper we build upon earlier work, undertaking the steps required for generating and preparing suitably minable data. Specifically, we are concerned with extracting as much useful data as possible out of a PCAP (Packet capture) for importing into Weka. While a PCAP may have thousands of field/value pairs, Wireshark and tshark's csv (comma separated value) output module only renders a small percentage of these fields and their values by default. We introduce a tool of our own making which enumerates every field (with or without a value) in any PCAP and generates an ARFF (Attribute-Relation File Format - Weka default). This code represents a component of a larger application we are designing (future work) which will ingest a PCAP, semi-autonomously preprocess it and feed it into Weka for processing/mining using several different algorithms.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Development of Leaving-bed Detection System to Prevent Midnight Prowl A source code plagiarism detecting method using alignment with abstract syntax tree elements Converting PCAPs into Weka mineable data Development of input assistance application for mobile devices for physically disabled Big data in memory: Benchimarking in memory database using the distributed key-value store for machine to machine communication
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1