统一主机与网络数据集的探索性数据分析

Catherine Beazley, Karan Gadiya, Ravi K U Rakesh, D. Roden, Boda Ye, Brendan Abraham, Donald E. Brown, M. Veeraraghavan
{"title":"统一主机与网络数据集的探索性数据分析","authors":"Catherine Beazley, Karan Gadiya, Ravi K U Rakesh, D. Roden, Boda Ye, Brendan Abraham, Donald E. Brown, M. Veeraraghavan","doi":"10.1109/SIEDS.2019.8735640","DOIUrl":null,"url":null,"abstract":"Exploratory data analysis is invaluable for understanding data, choosing correct models, and interpreting, validating, and applying results. It often leads to the discovery of patterns that can answer a number of research questions. In this paper, we perform exploratory data analysis on cybersecurity data in the NetFlow Dataset from “The Unified Host and Network Dataset”. “The Unified Host and Network Dataset” is a large, open source dataset collected on the Los Alamos National Laboratory (LANL) enterprise network that was published to encourage new research in cybersecurity. The NetFlow Dataset is a compilation of flow logs from routers within the LANL network that are aggregated to a relational format using network stitching. Our exploratory data analysis shows distinct patterns and clusters within a day of data. Specifically, scatter plots of the number of packets sent by the destination device versus the number of packets sent by the source device show three distinct, no-intercept linear relationships between the variables. The relationships suggest three common patterns for how the source device and destination device interactively send packets to each other. Our analysis also shows that byte and packet distributions of connections on rare ports and connections on common ports are statistically different, suggesting these differences can be used to discriminate between normal and abnormal network behavior. Our findings may be useful for research into classification problems with a Unified Host and Network Dataset and for furthering cluster analysis in cybersecurity research.","PeriodicalId":265421,"journal":{"name":"2019 Systems and Information Engineering Design Symposium (SIEDS)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Exploratory Data Analysis of a Unified Host and Network Dataset\",\"authors\":\"Catherine Beazley, Karan Gadiya, Ravi K U Rakesh, D. Roden, Boda Ye, Brendan Abraham, Donald E. Brown, M. Veeraraghavan\",\"doi\":\"10.1109/SIEDS.2019.8735640\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Exploratory data analysis is invaluable for understanding data, choosing correct models, and interpreting, validating, and applying results. It often leads to the discovery of patterns that can answer a number of research questions. In this paper, we perform exploratory data analysis on cybersecurity data in the NetFlow Dataset from “The Unified Host and Network Dataset”. “The Unified Host and Network Dataset” is a large, open source dataset collected on the Los Alamos National Laboratory (LANL) enterprise network that was published to encourage new research in cybersecurity. The NetFlow Dataset is a compilation of flow logs from routers within the LANL network that are aggregated to a relational format using network stitching. Our exploratory data analysis shows distinct patterns and clusters within a day of data. Specifically, scatter plots of the number of packets sent by the destination device versus the number of packets sent by the source device show three distinct, no-intercept linear relationships between the variables. The relationships suggest three common patterns for how the source device and destination device interactively send packets to each other. Our analysis also shows that byte and packet distributions of connections on rare ports and connections on common ports are statistically different, suggesting these differences can be used to discriminate between normal and abnormal network behavior. Our findings may be useful for research into classification problems with a Unified Host and Network Dataset and for furthering cluster analysis in cybersecurity research.\",\"PeriodicalId\":265421,\"journal\":{\"name\":\"2019 Systems and Information Engineering Design Symposium (SIEDS)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 Systems and Information Engineering Design Symposium (SIEDS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SIEDS.2019.8735640\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Systems and Information Engineering Design Symposium (SIEDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIEDS.2019.8735640","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

探索性数据分析对于理解数据、选择正确的模型以及解释、验证和应用结果是非常宝贵的。它通常会导致发现可以回答许多研究问题的模式。本文对来自“统一主机与网络数据集”的NetFlow数据集中的网络安全数据进行了探索性数据分析。“统一主机和网络数据集”是在洛斯阿拉莫斯国家实验室(LANL)企业网络上收集的大型开源数据集,旨在鼓励网络安全方面的新研究。NetFlow数据集是来自LANL网络内路由器的流量日志的汇编,这些日志使用网络拼接聚合为关系格式。我们的探索性数据分析在一天的数据中显示出不同的模式和集群。具体来说,目标设备发送的数据包数量与源设备发送的数据包数量的散点图显示了变量之间三个不同的、无截距的线性关系。这些关系为源设备和目的设备如何相互交互发送数据包提供了三种常见模式。我们的分析还表明,罕见端口上的连接和普通端口上的连接的字节和数据包分布在统计上是不同的,这表明这些差异可以用来区分正常和异常的网络行为。我们的发现可能有助于研究统一主机和网络数据集的分类问题,并进一步促进网络安全研究中的聚类分析。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Exploratory Data Analysis of a Unified Host and Network Dataset
Exploratory data analysis is invaluable for understanding data, choosing correct models, and interpreting, validating, and applying results. It often leads to the discovery of patterns that can answer a number of research questions. In this paper, we perform exploratory data analysis on cybersecurity data in the NetFlow Dataset from “The Unified Host and Network Dataset”. “The Unified Host and Network Dataset” is a large, open source dataset collected on the Los Alamos National Laboratory (LANL) enterprise network that was published to encourage new research in cybersecurity. The NetFlow Dataset is a compilation of flow logs from routers within the LANL network that are aggregated to a relational format using network stitching. Our exploratory data analysis shows distinct patterns and clusters within a day of data. Specifically, scatter plots of the number of packets sent by the destination device versus the number of packets sent by the source device show three distinct, no-intercept linear relationships between the variables. The relationships suggest three common patterns for how the source device and destination device interactively send packets to each other. Our analysis also shows that byte and packet distributions of connections on rare ports and connections on common ports are statistically different, suggesting these differences can be used to discriminate between normal and abnormal network behavior. Our findings may be useful for research into classification problems with a Unified Host and Network Dataset and for furthering cluster analysis in cybersecurity research.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
The Impact of Artificial Intelligence and Internet of Things in the Transformation of E-Business Sector Gamification of eHealth Interventions to Increase User Engagement and Reduce Attrition Modeling User Context from Smartphone Data for Recognition of Health Status Developing a data pipeline to improve accessibility and utilization of Charlottesville's Open Data Portal Deep Learning for Detecting Diseases in Gastrointestinal Biopsy Images
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1