{"title":"A Hypercuboid-Based Machine Learning Algorithm for Malware Classification","authors":"Thi Thu Trang Nguyen, Dai Tho Nguyen, Duy Loi Vu","doi":"10.1109/RIVF51545.2021.9642093","DOIUrl":null,"url":null,"abstract":"Malware attacks have been among the most serious threats to cyber security in the last decade. Antimalware software can help safeguard information systems and minimize their exposure to the malware. Most of anti-malware programs detect malware instances based on signature or pattern matching. Data mining and machine learning techniques can be used to automatically detect models and patterns behind different types of malware variants. However, traditional machine-based learning techniques such as SVM, decision trees and naive Bayes seem to be only suitable for detecting malicious code, not effective enough for complex problems such as classification. In this article, we propose a new prototype extraction method for non-traditional prototype-based machine learning classification. The prototypes are extracted using hypercuboids. Each hypercuboid covers all training data points of a malware family. Then we choose the data points nearest to the hyperplanes as the prototypes. Malware samples will be classified based on the distances to the prototypes. Experiments results show that our proposition leads to F1 score of 96.5% for classification of known malware and 97.7% for classification of unknown malware, both better than the original prototype-based classification method.","PeriodicalId":6860,"journal":{"name":"2021 RIVF International Conference on Computing and Communication Technologies (RIVF)","volume":"8 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 RIVF International Conference on Computing and Communication Technologies (RIVF)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RIVF51545.2021.9642093","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Malware attacks have been among the most serious threats to cyber security in the last decade. Antimalware software can help safeguard information systems and minimize their exposure to the malware. Most of anti-malware programs detect malware instances based on signature or pattern matching. Data mining and machine learning techniques can be used to automatically detect models and patterns behind different types of malware variants. However, traditional machine-based learning techniques such as SVM, decision trees and naive Bayes seem to be only suitable for detecting malicious code, not effective enough for complex problems such as classification. In this article, we propose a new prototype extraction method for non-traditional prototype-based machine learning classification. The prototypes are extracted using hypercuboids. Each hypercuboid covers all training data points of a malware family. Then we choose the data points nearest to the hyperplanes as the prototypes. Malware samples will be classified based on the distances to the prototypes. Experiments results show that our proposition leads to F1 score of 96.5% for classification of known malware and 97.7% for classification of unknown malware, both better than the original prototype-based classification method.