{"title":"基于自定义日志损失函数的梯度增强决策树的恶意软件检测","authors":"Yun Gao, Hirokazu Hasegawa, Yukiko Yamaguchi, Hajime Shimada","doi":"10.1109/ICOIN50884.2021.9333999","DOIUrl":null,"url":null,"abstract":"The increasing number of malicious software spread through the Internet has become a serious threat. Malware authors use obfuscation and deformation techniques to generate new types of malware to evade the detection of traditional detection methods so that it is widely expected for machine learning methods that classify malware and cleanware based on the characteristics of the samples. The current research trend is to use machine learning technology, especially decision tree technology, to identify new malicious software quickly and accurately. The purpose of this paper is to investigate malware classification accuracy based on the latest decision tree-based algorithms with a custom log loss function. Therefore, we use the FFRI Dataset 2019 to construct baseline malware detection models from surface analysis logs and PE header dumps. Then, we customize a classification log loss function, makes an 82% reduction of false positives with sacrificing twice false negatives. To keep malware detection covering and quick countermeasure to true positive results, we propose a hybrid usage of normal log loss function model and custom log loss function model to give additional priority to positive results.","PeriodicalId":6741,"journal":{"name":"2021 International Conference on Information Networking (ICOIN)","volume":"98 1","pages":"273-278"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Malware Detection Using Gradient Boosting Decision Trees with Customized Log Loss Function\",\"authors\":\"Yun Gao, Hirokazu Hasegawa, Yukiko Yamaguchi, Hajime Shimada\",\"doi\":\"10.1109/ICOIN50884.2021.9333999\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The increasing number of malicious software spread through the Internet has become a serious threat. Malware authors use obfuscation and deformation techniques to generate new types of malware to evade the detection of traditional detection methods so that it is widely expected for machine learning methods that classify malware and cleanware based on the characteristics of the samples. The current research trend is to use machine learning technology, especially decision tree technology, to identify new malicious software quickly and accurately. The purpose of this paper is to investigate malware classification accuracy based on the latest decision tree-based algorithms with a custom log loss function. Therefore, we use the FFRI Dataset 2019 to construct baseline malware detection models from surface analysis logs and PE header dumps. Then, we customize a classification log loss function, makes an 82% reduction of false positives with sacrificing twice false negatives. To keep malware detection covering and quick countermeasure to true positive results, we propose a hybrid usage of normal log loss function model and custom log loss function model to give additional priority to positive results.\",\"PeriodicalId\":6741,\"journal\":{\"name\":\"2021 International Conference on Information Networking (ICOIN)\",\"volume\":\"98 1\",\"pages\":\"273-278\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Information Networking (ICOIN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICOIN50884.2021.9333999\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Information Networking (ICOIN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOIN50884.2021.9333999","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Malware Detection Using Gradient Boosting Decision Trees with Customized Log Loss Function
The increasing number of malicious software spread through the Internet has become a serious threat. Malware authors use obfuscation and deformation techniques to generate new types of malware to evade the detection of traditional detection methods so that it is widely expected for machine learning methods that classify malware and cleanware based on the characteristics of the samples. The current research trend is to use machine learning technology, especially decision tree technology, to identify new malicious software quickly and accurately. The purpose of this paper is to investigate malware classification accuracy based on the latest decision tree-based algorithms with a custom log loss function. Therefore, we use the FFRI Dataset 2019 to construct baseline malware detection models from surface analysis logs and PE header dumps. Then, we customize a classification log loss function, makes an 82% reduction of false positives with sacrificing twice false negatives. To keep malware detection covering and quick countermeasure to true positive results, we propose a hybrid usage of normal log loss function model and custom log loss function model to give additional priority to positive results.