{"title":"Quantifying the Impact of Design Strategies for Big Data Cyber Security Analytics: An Empirical Investigation","authors":"Faheem Ullah, M. Babar","doi":"10.1109/PDCAT46702.2019.00037","DOIUrl":null,"url":null,"abstract":"Big Data Cyber Security Analytics (BDCA) systems use big data technologies (e.g., Hadoop and Spark) for collecting, storing, and analyzing a large volume of security event data to detect cyber-attacks. The state-of-the-art uses various design strategies (e.g., feature selection and alert ranking) to help BDCA systems to achieve the desired levels of accuracy and response time. However, the use of these strategies in the state-of-the-art is not consistent, which exposes a lack of consensus on \"when to use (and not to use) these design strategies?\" In this paper, we follow a systematic experimentation framework to quantify the impact of four design strategies on the accuracy and response time with respect to three contextual factors i.e., security data, machine learning model employed in the system, and the execution mode of the system. For the aimed quantification, we performed experiments on a Hadoop-based BDCA system using four security datasets, five machine learning models, and three execution modes. Our findings lead us to formulate a set of design guidelines that will help researchers and practitioners to decide when to use (and not to use) the design strategies.","PeriodicalId":166126,"journal":{"name":"2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDCAT46702.2019.00037","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Big Data Cyber Security Analytics (BDCA) systems use big data technologies (e.g., Hadoop and Spark) for collecting, storing, and analyzing a large volume of security event data to detect cyber-attacks. The state-of-the-art uses various design strategies (e.g., feature selection and alert ranking) to help BDCA systems to achieve the desired levels of accuracy and response time. However, the use of these strategies in the state-of-the-art is not consistent, which exposes a lack of consensus on "when to use (and not to use) these design strategies?" In this paper, we follow a systematic experimentation framework to quantify the impact of four design strategies on the accuracy and response time with respect to three contextual factors i.e., security data, machine learning model employed in the system, and the execution mode of the system. For the aimed quantification, we performed experiments on a Hadoop-based BDCA system using four security datasets, five machine learning models, and three execution modes. Our findings lead us to formulate a set of design guidelines that will help researchers and practitioners to decide when to use (and not to use) the design strategies.