{"title":"基于字向量和多层感知的恶意软件分类方法","authors":"Yanchen Qiao, Bin Zhang, Weizhe Zhang","doi":"10.1109/ICC40277.2020.9149143","DOIUrl":null,"url":null,"abstract":"The traditional machine learning-based malware classification methods are mainly based on feature engineering. In order to improve accuracy, many features will be extracted from malware files in these methods. That brings a high complexity to the classification. To solve this issue, this paper proposes a malware classification method based on the word vector of bytes in the malware sample and Multilayer Perception (MLP). A malware sample consists of large number of bytes with values ranging from $0{x}00$ to 0xFF. Therefore, every malware sample could be considered as a document written by bytes. And this document could be divided into sentences based on padding or meaningless bytes. In this paper, first, we use Word2Vec to calculate a 256 dimensions word vector for each byte. Second, we combine them into a matrix in ascending order. Third, we use MLP to train the model on the training samples. Finally, we use the trained model to classify the testing samples. The experimental results show that the method has a high accuracy of 98.89%.","PeriodicalId":106560,"journal":{"name":"ICC 2020 - 2020 IEEE International Conference on Communications (ICC)","volume":"104 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Malware Classification Method Based on Word Vector of Bytes and Multilayer Perception\",\"authors\":\"Yanchen Qiao, Bin Zhang, Weizhe Zhang\",\"doi\":\"10.1109/ICC40277.2020.9149143\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The traditional machine learning-based malware classification methods are mainly based on feature engineering. In order to improve accuracy, many features will be extracted from malware files in these methods. That brings a high complexity to the classification. To solve this issue, this paper proposes a malware classification method based on the word vector of bytes in the malware sample and Multilayer Perception (MLP). A malware sample consists of large number of bytes with values ranging from $0{x}00$ to 0xFF. Therefore, every malware sample could be considered as a document written by bytes. And this document could be divided into sentences based on padding or meaningless bytes. In this paper, first, we use Word2Vec to calculate a 256 dimensions word vector for each byte. Second, we combine them into a matrix in ascending order. Third, we use MLP to train the model on the training samples. Finally, we use the trained model to classify the testing samples. The experimental results show that the method has a high accuracy of 98.89%.\",\"PeriodicalId\":106560,\"journal\":{\"name\":\"ICC 2020 - 2020 IEEE International Conference on Communications (ICC)\",\"volume\":\"104 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ICC 2020 - 2020 IEEE International Conference on Communications (ICC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICC40277.2020.9149143\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICC 2020 - 2020 IEEE International Conference on Communications (ICC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICC40277.2020.9149143","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Malware Classification Method Based on Word Vector of Bytes and Multilayer Perception
The traditional machine learning-based malware classification methods are mainly based on feature engineering. In order to improve accuracy, many features will be extracted from malware files in these methods. That brings a high complexity to the classification. To solve this issue, this paper proposes a malware classification method based on the word vector of bytes in the malware sample and Multilayer Perception (MLP). A malware sample consists of large number of bytes with values ranging from $0{x}00$ to 0xFF. Therefore, every malware sample could be considered as a document written by bytes. And this document could be divided into sentences based on padding or meaningless bytes. In this paper, first, we use Word2Vec to calculate a 256 dimensions word vector for each byte. Second, we combine them into a matrix in ascending order. Third, we use MLP to train the model on the training samples. Finally, we use the trained model to classify the testing samples. The experimental results show that the method has a high accuracy of 98.89%.