{"title":"Using Deep Generative Models to Boost Forecasting: A Phishing Prediction Case Study","authors":"Syed Hasan Amin Mahmood, A. Abbasi","doi":"10.1109/ICDMW51313.2020.00073","DOIUrl":null,"url":null,"abstract":"Time series predictions are important for various application domains. However, effective forecasting can be challenging in noisy contexts devoid of time series data encompassing stationarity, cyclicality, completeness, and non-sparseness. Cyber-security is a good example of such context. In organizational security settings, predicting time series related to emerging attacks could enhance cyber threat intelligence, resulting in timely and actionable insights at the operational, tactical, and strategic levels. In order to explore this gap, we propose a deep generative model-based framework for time series forecasting in noisy data environments. The proposed framework incorporates a novel ensembling strategy where generative adversarial networks and recurrent variational autoencoders are leveraged in unison with base predictors for enhanced regularization of time series predictive models. The framework is extensible, supporting different model combinations and analytical or iterative model fusion strategies. Using a test bed encompassing 10 years of weekly phishing attack volume data from 5 organizations in the technology, financial services, and social networking sectors, we show that the framework can boost predictive power for various standard time series models. Additional results reveal that the framework outperforms generative data augmentation approaches designed to enrich the input time series data matrices. Collectively, our findings suggest that utilizing generative models in more robust end-to-end setup can improve prediction in cyber threat intelligence contexts, as well as related problems involving challenging time series data.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Data Mining Workshops (ICDMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW51313.2020.00073","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Time series predictions are important for various application domains. However, effective forecasting can be challenging in noisy contexts devoid of time series data encompassing stationarity, cyclicality, completeness, and non-sparseness. Cyber-security is a good example of such context. In organizational security settings, predicting time series related to emerging attacks could enhance cyber threat intelligence, resulting in timely and actionable insights at the operational, tactical, and strategic levels. In order to explore this gap, we propose a deep generative model-based framework for time series forecasting in noisy data environments. The proposed framework incorporates a novel ensembling strategy where generative adversarial networks and recurrent variational autoencoders are leveraged in unison with base predictors for enhanced regularization of time series predictive models. The framework is extensible, supporting different model combinations and analytical or iterative model fusion strategies. Using a test bed encompassing 10 years of weekly phishing attack volume data from 5 organizations in the technology, financial services, and social networking sectors, we show that the framework can boost predictive power for various standard time series models. Additional results reveal that the framework outperforms generative data augmentation approaches designed to enrich the input time series data matrices. Collectively, our findings suggest that utilizing generative models in more robust end-to-end setup can improve prediction in cyber threat intelligence contexts, as well as related problems involving challenging time series data.