Gülsade Kale , Gazi Erkan Bostancı , Fatih Vehbi Çelebi
{"title":"Evolutionary feature selection for machine learning based malware classification","authors":"Gülsade Kale , Gazi Erkan Bostancı , Fatih Vehbi Çelebi","doi":"10.1016/j.jestch.2024.101762","DOIUrl":null,"url":null,"abstract":"<div><p>Conducting thorough research, analysis, and detection of cyber-threatening malware with the right parameters is crucial for safeguarding a country’s security and economy. Increasingly sophisticated cyber-attacks directly affect individual welfare, social dynamics, and political stability. So, due to the evolving nature of malware, which continuously improves itself to evade detection, it is even more essential to select effective and decisive parameters, considering interactions among various malware features. As malware evolves with new technologies and techniques, signature-based detection systems are becoming inadequate. Instead of relying on these still widely used but insufficient systems, in this study a new system was established focusing on malware behavior and the relationships between malware features resulting from these behaviors. In this system, rather than using a uniform approach, multi-objective genetic algorithms (MOGAs) are employed to select critical and decisive features for malware detection. These selected features are then utilized by machine learning (ML) algorithms within the implemented hybrid framework to accurately detect and classify malware.</p><p>The aim of this paper is to identify the optimal feature selection and classification methods yielding the highest accuracy within the Cuckoo Sandbox environment. Specifically, the J48 Decision Tree (J48), Reduced Error Pruning Tree (REP Tree), Adaptive Boosting Model 1 (AdaboostM1), Multilayer Perceptron (MLP), and Naive Bayes (NB) classifiers were assessed. Through our analysis, the feature set was refined from 335 to 200, considering inter-feature relationships, resulting in a peak accuracy of 93.33% and a corresponding 40% performance enhancement due to the reduction in the number of features. The obtained metrics were meticulously compared and evaluated with respect to the employed algorithms and methodologies. Additionally, Mc Nemar’s test was utilized to evaluate the performance of different malware detection classifiers by comparing their correct and incorrect classifications. Notably, the Mc Nemar’s test revealed significant improvements upon analysis of the results.</p></div>","PeriodicalId":48609,"journal":{"name":"Engineering Science and Technology-An International Journal-Jestech","volume":"56 ","pages":"Article 101762"},"PeriodicalIF":5.1000,"publicationDate":"2024-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2215098624001484/pdfft?md5=0d6938e427b84bd803811ba937d012a9&pid=1-s2.0-S2215098624001484-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Science and Technology-An International Journal-Jestech","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2215098624001484","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Conducting thorough research, analysis, and detection of cyber-threatening malware with the right parameters is crucial for safeguarding a country’s security and economy. Increasingly sophisticated cyber-attacks directly affect individual welfare, social dynamics, and political stability. So, due to the evolving nature of malware, which continuously improves itself to evade detection, it is even more essential to select effective and decisive parameters, considering interactions among various malware features. As malware evolves with new technologies and techniques, signature-based detection systems are becoming inadequate. Instead of relying on these still widely used but insufficient systems, in this study a new system was established focusing on malware behavior and the relationships between malware features resulting from these behaviors. In this system, rather than using a uniform approach, multi-objective genetic algorithms (MOGAs) are employed to select critical and decisive features for malware detection. These selected features are then utilized by machine learning (ML) algorithms within the implemented hybrid framework to accurately detect and classify malware.
The aim of this paper is to identify the optimal feature selection and classification methods yielding the highest accuracy within the Cuckoo Sandbox environment. Specifically, the J48 Decision Tree (J48), Reduced Error Pruning Tree (REP Tree), Adaptive Boosting Model 1 (AdaboostM1), Multilayer Perceptron (MLP), and Naive Bayes (NB) classifiers were assessed. Through our analysis, the feature set was refined from 335 to 200, considering inter-feature relationships, resulting in a peak accuracy of 93.33% and a corresponding 40% performance enhancement due to the reduction in the number of features. The obtained metrics were meticulously compared and evaluated with respect to the employed algorithms and methodologies. Additionally, Mc Nemar’s test was utilized to evaluate the performance of different malware detection classifiers by comparing their correct and incorrect classifications. Notably, the Mc Nemar’s test revealed significant improvements upon analysis of the results.
期刊介绍:
Engineering Science and Technology, an International Journal (JESTECH) (formerly Technology), a peer-reviewed quarterly engineering journal, publishes both theoretical and experimental high quality papers of permanent interest, not previously published in journals, in the field of engineering and applied science which aims to promote the theory and practice of technology and engineering. In addition to peer-reviewed original research papers, the Editorial Board welcomes original research reports, state-of-the-art reviews and communications in the broadly defined field of engineering science and technology.
The scope of JESTECH includes a wide spectrum of subjects including:
-Electrical/Electronics and Computer Engineering (Biomedical Engineering and Instrumentation; Coding, Cryptography, and Information Protection; Communications, Networks, Mobile Computing and Distributed Systems; Compilers and Operating Systems; Computer Architecture, Parallel Processing, and Dependability; Computer Vision and Robotics; Control Theory; Electromagnetic Waves, Microwave Techniques and Antennas; Embedded Systems; Integrated Circuits, VLSI Design, Testing, and CAD; Microelectromechanical Systems; Microelectronics, and Electronic Devices and Circuits; Power, Energy and Energy Conversion Systems; Signal, Image, and Speech Processing)
-Mechanical and Civil Engineering (Automotive Technologies; Biomechanics; Construction Materials; Design and Manufacturing; Dynamics and Control; Energy Generation, Utilization, Conversion, and Storage; Fluid Mechanics and Hydraulics; Heat and Mass Transfer; Micro-Nano Sciences; Renewable and Sustainable Energy Technologies; Robotics and Mechatronics; Solid Mechanics and Structure; Thermal Sciences)
-Metallurgical and Materials Engineering (Advanced Materials Science; Biomaterials; Ceramic and Inorgnanic Materials; Electronic-Magnetic Materials; Energy and Environment; Materials Characterizastion; Metallurgy; Polymers and Nanocomposites)