{"title":"基于半监督集成学习算法的非平衡数据网络入侵检测","authors":"Zhang Lin","doi":"10.1109/NaNA53684.2021.00065","DOIUrl":null,"url":null,"abstract":"In many practical applications, due to the high cost of data annotation, the training dataset includes a large number of unlabeled samples and a small number of labeled samples. At the same time, there are a large number of normal behavior data and a small number of intrusion data in the network data. In order to solve this problem, this paper proposes a semi-supervised ensemble learning algorithm for imbalanced data. This algorithm uses the relationship between class samples to define the sampling probability of samples, and then constructs the initial training subset and the base classifier according to the sampling probability. Then, the evaluation index for imbalanced data is defined to evaluate and select base classifiers. Then the weighted voting method is used to integrate the selected base classifier. Finally, the simulation results of UCI data set and NSL-KDD data set show that the algorithm can improve the detection accuracy, especially the recognition rate of unknown intrusion behavior.","PeriodicalId":414672,"journal":{"name":"2021 International Conference on Networking and Network Applications (NaNA)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Network Intrusion Detection based of Semi-Supervised Ensemble Learning Algorithm for Imbalanced Data\",\"authors\":\"Zhang Lin\",\"doi\":\"10.1109/NaNA53684.2021.00065\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In many practical applications, due to the high cost of data annotation, the training dataset includes a large number of unlabeled samples and a small number of labeled samples. At the same time, there are a large number of normal behavior data and a small number of intrusion data in the network data. In order to solve this problem, this paper proposes a semi-supervised ensemble learning algorithm for imbalanced data. This algorithm uses the relationship between class samples to define the sampling probability of samples, and then constructs the initial training subset and the base classifier according to the sampling probability. Then, the evaluation index for imbalanced data is defined to evaluate and select base classifiers. Then the weighted voting method is used to integrate the selected base classifier. Finally, the simulation results of UCI data set and NSL-KDD data set show that the algorithm can improve the detection accuracy, especially the recognition rate of unknown intrusion behavior.\",\"PeriodicalId\":414672,\"journal\":{\"name\":\"2021 International Conference on Networking and Network Applications (NaNA)\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Networking and Network Applications (NaNA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NaNA53684.2021.00065\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Networking and Network Applications (NaNA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NaNA53684.2021.00065","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Network Intrusion Detection based of Semi-Supervised Ensemble Learning Algorithm for Imbalanced Data
In many practical applications, due to the high cost of data annotation, the training dataset includes a large number of unlabeled samples and a small number of labeled samples. At the same time, there are a large number of normal behavior data and a small number of intrusion data in the network data. In order to solve this problem, this paper proposes a semi-supervised ensemble learning algorithm for imbalanced data. This algorithm uses the relationship between class samples to define the sampling probability of samples, and then constructs the initial training subset and the base classifier according to the sampling probability. Then, the evaluation index for imbalanced data is defined to evaluate and select base classifiers. Then the weighted voting method is used to integrate the selected base classifier. Finally, the simulation results of UCI data set and NSL-KDD data set show that the algorithm can improve the detection accuracy, especially the recognition rate of unknown intrusion behavior.