{"title":"网络安全中的大数据分析:网络数据和入侵预测","authors":"Lidong Wang, Randy Jones","doi":"10.1109/UEMCON47517.2019.8993037","DOIUrl":null,"url":null,"abstract":"Intrusion detection of computer networks is an important issue in cybersecurity. Networks generate stream data which are big data and often lead to challenges in intrusion detection. The ‘Variety’ and ‘Veracity’ characteristics of big data in network data are studied using $R$ and its functions in this paper. The statistics, correlation, and association of variables in the spam email database ‘spambase’ are analysed. The clustering analysis based on k-means and principal component analysis for the data dimension reduction of the database are performed. Spam-email intrusion is predicted based on the Naïve Bayesian classification and deep learning, respectively. The analytics of missing values and missing data patterns in a large data set of ‘VAST 2013’ (with multiple data types and a huge volume of missing values) is conducted and its missing data patterns are obtained.","PeriodicalId":187022,"journal":{"name":"2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Big Data Analytics in Cybersecurity: Network Data and Intrusion Prediction\",\"authors\":\"Lidong Wang, Randy Jones\",\"doi\":\"10.1109/UEMCON47517.2019.8993037\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Intrusion detection of computer networks is an important issue in cybersecurity. Networks generate stream data which are big data and often lead to challenges in intrusion detection. The ‘Variety’ and ‘Veracity’ characteristics of big data in network data are studied using $R$ and its functions in this paper. The statistics, correlation, and association of variables in the spam email database ‘spambase’ are analysed. The clustering analysis based on k-means and principal component analysis for the data dimension reduction of the database are performed. Spam-email intrusion is predicted based on the Naïve Bayesian classification and deep learning, respectively. The analytics of missing values and missing data patterns in a large data set of ‘VAST 2013’ (with multiple data types and a huge volume of missing values) is conducted and its missing data patterns are obtained.\",\"PeriodicalId\":187022,\"journal\":{\"name\":\"2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/UEMCON47517.2019.8993037\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UEMCON47517.2019.8993037","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Big Data Analytics in Cybersecurity: Network Data and Intrusion Prediction
Intrusion detection of computer networks is an important issue in cybersecurity. Networks generate stream data which are big data and often lead to challenges in intrusion detection. The ‘Variety’ and ‘Veracity’ characteristics of big data in network data are studied using $R$ and its functions in this paper. The statistics, correlation, and association of variables in the spam email database ‘spambase’ are analysed. The clustering analysis based on k-means and principal component analysis for the data dimension reduction of the database are performed. Spam-email intrusion is predicted based on the Naïve Bayesian classification and deep learning, respectively. The analytics of missing values and missing data patterns in a large data set of ‘VAST 2013’ (with multiple data types and a huge volume of missing values) is conducted and its missing data patterns are obtained.