基于K-Means的大数据异常检测扩展隔离林

Md Tahmid Rahman Laskar, J. Huang, Vladan Smetana, Chris Stewart, Kees Pouw, Aijun An, Steve Chan, Lei Liu
{"title":"基于K-Means的大数据异常检测扩展隔离林","authors":"Md Tahmid Rahman Laskar, J. Huang, Vladan Smetana, Chris Stewart, Kees Pouw, Aijun An, Steve Chan, Lei Liu","doi":"10.1145/3460976","DOIUrl":null,"url":null,"abstract":"Industrial Information Technology infrastructures are often vulnerable to cyberattacks. To ensure security to the computer systems in an industrial environment, it is required to build effective intrusion detection systems to monitor the cyber-physical systems (e.g., computer networks) in the industry for malicious activities. This article aims to build such intrusion detection systems to protect the computer networks from cyberattacks. More specifically, we propose a novel unsupervised machine learning approach that combines the K-Means algorithm with the Isolation Forest for anomaly detection in industrial big data scenarios. Since our objective is to build the intrusion detection system for the big data scenario in the industrial domain, we utilize the Apache Spark framework to implement our proposed model that was trained in large network traffic data (about 123 million instances of network traffic) stored in Elasticsearch. Moreover, we evaluate our proposed model on the live streaming data and find that our proposed system can be used for real-time anomaly detection in the industrial setup. In addition, we address different challenges that we face while training our model on large datasets and explicitly describe how these issues were resolved. Based on our empirical evaluation in different use cases for anomaly detection in real-world network traffic data, we observe that our proposed system is effective to detect anomalies in big data scenarios. Finally, we evaluate our proposed model on several academic datasets to compare with other models and find that it provides comparable performance with other state-of-the-art approaches.","PeriodicalId":380257,"journal":{"name":"ACM Transactions on Cyber-Physical Systems (TCPS)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Extending Isolation Forest for Anomaly Detection in Big Data via K-Means\",\"authors\":\"Md Tahmid Rahman Laskar, J. Huang, Vladan Smetana, Chris Stewart, Kees Pouw, Aijun An, Steve Chan, Lei Liu\",\"doi\":\"10.1145/3460976\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Industrial Information Technology infrastructures are often vulnerable to cyberattacks. To ensure security to the computer systems in an industrial environment, it is required to build effective intrusion detection systems to monitor the cyber-physical systems (e.g., computer networks) in the industry for malicious activities. This article aims to build such intrusion detection systems to protect the computer networks from cyberattacks. More specifically, we propose a novel unsupervised machine learning approach that combines the K-Means algorithm with the Isolation Forest for anomaly detection in industrial big data scenarios. Since our objective is to build the intrusion detection system for the big data scenario in the industrial domain, we utilize the Apache Spark framework to implement our proposed model that was trained in large network traffic data (about 123 million instances of network traffic) stored in Elasticsearch. Moreover, we evaluate our proposed model on the live streaming data and find that our proposed system can be used for real-time anomaly detection in the industrial setup. In addition, we address different challenges that we face while training our model on large datasets and explicitly describe how these issues were resolved. Based on our empirical evaluation in different use cases for anomaly detection in real-world network traffic data, we observe that our proposed system is effective to detect anomalies in big data scenarios. Finally, we evaluate our proposed model on several academic datasets to compare with other models and find that it provides comparable performance with other state-of-the-art approaches.\",\"PeriodicalId\":380257,\"journal\":{\"name\":\"ACM Transactions on Cyber-Physical Systems (TCPS)\",\"volume\":\"40 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Cyber-Physical Systems (TCPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3460976\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Cyber-Physical Systems (TCPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3460976","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

摘要

工业信息技术基础设施往往容易受到网络攻击。为了确保工业环境中计算机系统的安全,需要建立有效的入侵检测系统来监控工业中的网络物理系统(如计算机网络)的恶意活动。本文旨在建立这样的入侵检测系统,以保护计算机网络免受网络攻击。更具体地说,我们提出了一种新的无监督机器学习方法,该方法将K-Means算法与隔离森林相结合,用于工业大数据场景中的异常检测。由于我们的目标是为工业领域的大数据场景构建入侵检测系统,我们利用Apache Spark框架来实现我们提出的模型,该模型是在存储在Elasticsearch中的大型网络流量数据(约1.23亿网络流量实例)中训练出来的。此外,我们在实时流数据上评估了我们提出的模型,发现我们提出的系统可以用于工业设置中的实时异常检测。此外,我们解决了在大型数据集上训练模型时面临的不同挑战,并明确描述了这些问题是如何解决的。基于我们对真实网络流量数据异常检测的不同用例的经验评估,我们观察到我们提出的系统可以有效地检测大数据场景下的异常。最后,我们在几个学术数据集上评估了我们提出的模型,并与其他模型进行了比较,发现它与其他最先进的方法提供了相当的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Extending Isolation Forest for Anomaly Detection in Big Data via K-Means
Industrial Information Technology infrastructures are often vulnerable to cyberattacks. To ensure security to the computer systems in an industrial environment, it is required to build effective intrusion detection systems to monitor the cyber-physical systems (e.g., computer networks) in the industry for malicious activities. This article aims to build such intrusion detection systems to protect the computer networks from cyberattacks. More specifically, we propose a novel unsupervised machine learning approach that combines the K-Means algorithm with the Isolation Forest for anomaly detection in industrial big data scenarios. Since our objective is to build the intrusion detection system for the big data scenario in the industrial domain, we utilize the Apache Spark framework to implement our proposed model that was trained in large network traffic data (about 123 million instances of network traffic) stored in Elasticsearch. Moreover, we evaluate our proposed model on the live streaming data and find that our proposed system can be used for real-time anomaly detection in the industrial setup. In addition, we address different challenges that we face while training our model on large datasets and explicitly describe how these issues were resolved. Based on our empirical evaluation in different use cases for anomaly detection in real-world network traffic data, we observe that our proposed system is effective to detect anomalies in big data scenarios. Finally, we evaluate our proposed model on several academic datasets to compare with other models and find that it provides comparable performance with other state-of-the-art approaches.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Introduction to the Special Section on Selected Papers from ICCPS 2021 How Hard Is Cyber-risk Management in IT/OT Systems? A Theory to Classify and Conquer Hardness of Insuring ICSs Game Theory–Based Parameter Tuning for Energy-Efficient Path Planning on Modern UAVs OD1NF1ST: True Skip Intrusion Detection and Avionics Network Cyber-attack Simulation Coordinated Charging and Discharging of Electric Vehicles: A New Class of Switching Attacks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1