An ML Based Anomaly Detection System in real-time data streams

2021 International Conference on Computational Science and Computational Intelligence (CSCI) Pub Date : 2021-12-01 DOI:10.1109/CSCI54926.2021.00270

Javier Jose Diaz Rivera, Talha Ahmed Khan, Waleed Akbar, Muhammad Afaq, Wang-Cheol Song

{"title":"An ML Based Anomaly Detection System in real-time data streams","authors":"Javier Jose Diaz Rivera, Talha Ahmed Khan, Waleed Akbar, Muhammad Afaq, Wang-Cheol Song","doi":"10.1109/CSCI54926.2021.00270","DOIUrl":null,"url":null,"abstract":"Due to the advancements in machine learning and artificial intelligence applied fields, network anomaly detection systems have experienced an evolution from traditional signature-based methods for intrusion detection. Nonetheless, as security measures evolve, more sophisticated attacks are also constantly being developed by hackers. Not only a robust anomaly detection algorithm is needed, but also a real-time data feeding mechanism for minimizing the reaction-time impact is required. Moreover, DDoS attacks can flood the network data channels with more than thousands of packets per second with the latent effect of overloading most traditional monitoring systems that rely on data storage. Due to this, the research presented in this paper focuses its efforts on implementing a real-time data streaming system for network anomaly detection that can operate during a high volume of traffic data. The solution includes the deployment of a flow collector platform connected to Apache Kafka for receiving NetFlow data from network switches. Also, real-time big data processing techniques are applied through Apache Spark, where the ML anomaly detection is triggered. The detection of anomalies is performed by a combination of the unsupervised learning clustering algorithm k-means and the supervised learning classifier KNN (k- nearest neighbors). Finally, a monitoring system consisting of an ELK stack collects historical data for further evolution of the ML algorithms.","PeriodicalId":206881,"journal":{"name":"2021 International Conference on Computational Science and Computational Intelligence (CSCI)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Computational Science and Computational Intelligence (CSCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSCI54926.2021.00270","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Due to the advancements in machine learning and artificial intelligence applied fields, network anomaly detection systems have experienced an evolution from traditional signature-based methods for intrusion detection. Nonetheless, as security measures evolve, more sophisticated attacks are also constantly being developed by hackers. Not only a robust anomaly detection algorithm is needed, but also a real-time data feeding mechanism for minimizing the reaction-time impact is required. Moreover, DDoS attacks can flood the network data channels with more than thousands of packets per second with the latent effect of overloading most traditional monitoring systems that rely on data storage. Due to this, the research presented in this paper focuses its efforts on implementing a real-time data streaming system for network anomaly detection that can operate during a high volume of traffic data. The solution includes the deployment of a flow collector platform connected to Apache Kafka for receiving NetFlow data from network switches. Also, real-time big data processing techniques are applied through Apache Spark, where the ML anomaly detection is triggered. The detection of anomalies is performed by a combination of the unsupervised learning clustering algorithm k-means and the supervised learning classifier KNN (k- nearest neighbors). Finally, a monitoring system consisting of an ELK stack collects historical data for further evolution of the ML algorithms.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于机器学习的实时数据流异常检测系统

由于机器学习和人工智能应用领域的进步，网络异常检测系统经历了从传统的基于签名的入侵检测方法的演变。尽管如此，随着安全措施的发展，黑客也在不断开发更复杂的攻击。不仅需要一个鲁棒的异常检测算法，还需要一个实时的数据馈送机制，以最大限度地减少反应时间的影响。此外，DDoS攻击可以以每秒数千个数据包的速度淹没网络数据通道，从而潜在地使大多数依赖数据存储的传统监控系统过载。因此，本文的研究重点是实现一个可以在大流量数据下运行的网络异常检测实时数据流系统。该解决方案包括部署一个流采集器平台，连接到Apache Kafka，用于接收来自网络交换机的NetFlow数据。同时，通过Apache Spark应用实时大数据处理技术，触发机器学习异常检测。异常检测由无监督学习聚类算法k-means和监督学习分类器KNN (k-最近邻)相结合来完成。最后，由ELK堆栈组成的监控系统收集历史数据，用于ML算法的进一步发展。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2021 International Conference on Computational Science and Computational Intelligence (CSCI)

自引率

0.00%

发文量