Quantifying the Impact of Design Strategies for Big Data Cyber Security Analytics: An Empirical Investigation

2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT) Pub Date : 2019-12-01 DOI:10.1109/PDCAT46702.2019.00037

Faheem Ullah, M. Babar

{"title":"Quantifying the Impact of Design Strategies for Big Data Cyber Security Analytics: An Empirical Investigation","authors":"Faheem Ullah, M. Babar","doi":"10.1109/PDCAT46702.2019.00037","DOIUrl":null,"url":null,"abstract":"Big Data Cyber Security Analytics (BDCA) systems use big data technologies (e.g., Hadoop and Spark) for collecting, storing, and analyzing a large volume of security event data to detect cyber-attacks. The state-of-the-art uses various design strategies (e.g., feature selection and alert ranking) to help BDCA systems to achieve the desired levels of accuracy and response time. However, the use of these strategies in the state-of-the-art is not consistent, which exposes a lack of consensus on \"when to use (and not to use) these design strategies?\" In this paper, we follow a systematic experimentation framework to quantify the impact of four design strategies on the accuracy and response time with respect to three contextual factors i.e., security data, machine learning model employed in the system, and the execution mode of the system. For the aimed quantification, we performed experiments on a Hadoop-based BDCA system using four security datasets, five machine learning models, and three execution modes. Our findings lead us to formulate a set of design guidelines that will help researchers and practitioners to decide when to use (and not to use) the design strategies.","PeriodicalId":166126,"journal":{"name":"2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDCAT46702.2019.00037","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Big Data Cyber Security Analytics (BDCA) systems use big data technologies (e.g., Hadoop and Spark) for collecting, storing, and analyzing a large volume of security event data to detect cyber-attacks. The state-of-the-art uses various design strategies (e.g., feature selection and alert ranking) to help BDCA systems to achieve the desired levels of accuracy and response time. However, the use of these strategies in the state-of-the-art is not consistent, which exposes a lack of consensus on "when to use (and not to use) these design strategies?" In this paper, we follow a systematic experimentation framework to quantify the impact of four design strategies on the accuracy and response time with respect to three contextual factors i.e., security data, machine learning model employed in the system, and the execution mode of the system. For the aimed quantification, we performed experiments on a Hadoop-based BDCA system using four security datasets, five machine learning models, and three execution modes. Our findings lead us to formulate a set of design guidelines that will help researchers and practitioners to decide when to use (and not to use) the design strategies.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

量化大数据网络安全分析设计策略的影响:一项实证调查

大数据网络安全分析(BDCA)系统使用大数据技术(如Hadoop和Spark)收集、存储和分析大量安全事件数据，以检测网络攻击。最先进的技术使用各种设计策略(例如，特征选择和警报排名)来帮助BDCA系统达到所需的准确性和响应时间水平。然而，这些策略在最新技术中的使用是不一致的，这暴露了对“何时使用(或不使用)这些设计策略”缺乏共识?在本文中，我们遵循一个系统的实验框架来量化四种设计策略对准确性和响应时间的影响，涉及三个上下文因素，即安全数据、系统中使用的机器学习模型和系统的执行模式。为了量化目标，我们在基于hadoop的BDCA系统上进行了实验，使用了四个安全数据集、五个机器学习模型和三种执行模式。我们的发现引导我们制定了一套设计指南，帮助研究人员和实践者决定何时使用(或不使用)设计策略。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)

自引率

0.00%

发文量