{"title":"A MACHINE LEARNING CLASSIFICATION APPROACH TO DETECT TLS-BASED MALWARE USING ENTROPY-BASED FLOW SET FEATURES","authors":"Kinan Keshkeh, A. Jantan, Kamal Alieyan","doi":"10.32890/jict2022.21.3.1","DOIUrl":null,"url":null,"abstract":"Transport Layer Security (TLS) based malware is one of the most hazardous malware types, as it relies on encryption to conceal connections. Due to the complexity of TLS traffic decryption, several anomaly-based detection studies have been conducted to detect TLS-based malware using different features and machine learning (ML) algorithms. However, most of these studies utilized flow features with no feature transformation or relied on inefficient flow feature transformations like frequency-based periodicity analysis and outliers percentage. This paper introduces TLSMalDetect, a TLS-based malware detection approach that integrates periodicity-independent entropy-based flow set (EFS) features generated by a flow feature transformation technique to solve flow feature utilization issues in related research. EFS features effectiveness was evaluated in two ways: (1) by comparing them to the corresponding outliers percentage and flow features using four feature importance methods, and (2) by analyzing classification performance with and without EFS features. Moreover, new Transmission Control Protocol features not explored in literature were incorporated into TLSMalDetect, and their contribution was assessed. This study’s results proved EFS features of the number of packets sent and received were superior to related outliers percentage and flow features and could remarkably increase the performance up to ~42% in the case of Support Vector Machine accuracy. Furthermore, using the basic features, TLSMalDetect achieved the highest accuracy of 93.69% by Naïve Bayes (NB) among the ML algorithms applied. Also, from a comparison view, TLSMalDetect’s Random Forest precision of 98.99% and NB recall of 92.91% exceeded the best relevant findings of previous studies. These comparative results demonstrated the TLSMalDetect’s ability to detect more malware flows out of total malicious flows than existing works. It could also generate more actual alerts from overall alerts than earlier research.Transport Layer Security (TLS) based malware is one of the most hazardous malware types, as it relies on encryption to conceal connections. Due to the complexity of TLS traffic decryption, several anomaly-based detection studies have been conducted to detect TLS-based malware using different features and machine learning (ML) algorithms. However, most of these studies utilized flow features with no feature transformation or relied on inefficient flow feature transformations like frequency-based periodicity analysis and outliers percentage. This paper introduces TLSMalDetect, a TLS-based malware detection approach that integrates periodicity-independent entropy-based flow set (EFS) features generated by a flow feature transformation technique to solve flow feature utilization issues in related research. EFS features effectiveness was evaluated in two ways: (1) by comparing them to the corresponding outliers percentage and flow features using four feature importance methods, and (2) by analyzing classification performance with and without EFS features. Moreover, new Transmission Control Protocol features not explored in literature were incorporated into TLSMalDetect, and their contribution was assessed. This study’s results proved EFS features of the number of packets sent and received were superior to related outliers percentage and flow features and could remarkably increase the performance up to ~42% in the case of Support Vector Machine accuracy. Furthermore, using the basic features, TLSMalDetect achieved the highest accuracy of 93.69% by Naïve Bayes (NB) among the ML algorithms applied. Also, from a comparison view, TLSMalDetect’s Random Forest precision of 98.99% and NB recall of 92.91% exceeded the best relevant findings of previous studies. These comparative results demonstrated the TLSMalDetect’s ability to detect more malware flows out of total malicious flows than existing works. It could also generate more actual alerts from overall alerts than earlier research.","PeriodicalId":39396,"journal":{"name":"International Journal of Information and Communication Technology","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Information and Communication Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.32890/jict2022.21.3.1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 0
Abstract
Transport Layer Security (TLS) based malware is one of the most hazardous malware types, as it relies on encryption to conceal connections. Due to the complexity of TLS traffic decryption, several anomaly-based detection studies have been conducted to detect TLS-based malware using different features and machine learning (ML) algorithms. However, most of these studies utilized flow features with no feature transformation or relied on inefficient flow feature transformations like frequency-based periodicity analysis and outliers percentage. This paper introduces TLSMalDetect, a TLS-based malware detection approach that integrates periodicity-independent entropy-based flow set (EFS) features generated by a flow feature transformation technique to solve flow feature utilization issues in related research. EFS features effectiveness was evaluated in two ways: (1) by comparing them to the corresponding outliers percentage and flow features using four feature importance methods, and (2) by analyzing classification performance with and without EFS features. Moreover, new Transmission Control Protocol features not explored in literature were incorporated into TLSMalDetect, and their contribution was assessed. This study’s results proved EFS features of the number of packets sent and received were superior to related outliers percentage and flow features and could remarkably increase the performance up to ~42% in the case of Support Vector Machine accuracy. Furthermore, using the basic features, TLSMalDetect achieved the highest accuracy of 93.69% by Naïve Bayes (NB) among the ML algorithms applied. Also, from a comparison view, TLSMalDetect’s Random Forest precision of 98.99% and NB recall of 92.91% exceeded the best relevant findings of previous studies. These comparative results demonstrated the TLSMalDetect’s ability to detect more malware flows out of total malicious flows than existing works. It could also generate more actual alerts from overall alerts than earlier research.Transport Layer Security (TLS) based malware is one of the most hazardous malware types, as it relies on encryption to conceal connections. Due to the complexity of TLS traffic decryption, several anomaly-based detection studies have been conducted to detect TLS-based malware using different features and machine learning (ML) algorithms. However, most of these studies utilized flow features with no feature transformation or relied on inefficient flow feature transformations like frequency-based periodicity analysis and outliers percentage. This paper introduces TLSMalDetect, a TLS-based malware detection approach that integrates periodicity-independent entropy-based flow set (EFS) features generated by a flow feature transformation technique to solve flow feature utilization issues in related research. EFS features effectiveness was evaluated in two ways: (1) by comparing them to the corresponding outliers percentage and flow features using four feature importance methods, and (2) by analyzing classification performance with and without EFS features. Moreover, new Transmission Control Protocol features not explored in literature were incorporated into TLSMalDetect, and their contribution was assessed. This study’s results proved EFS features of the number of packets sent and received were superior to related outliers percentage and flow features and could remarkably increase the performance up to ~42% in the case of Support Vector Machine accuracy. Furthermore, using the basic features, TLSMalDetect achieved the highest accuracy of 93.69% by Naïve Bayes (NB) among the ML algorithms applied. Also, from a comparison view, TLSMalDetect’s Random Forest precision of 98.99% and NB recall of 92.91% exceeded the best relevant findings of previous studies. These comparative results demonstrated the TLSMalDetect’s ability to detect more malware flows out of total malicious flows than existing works. It could also generate more actual alerts from overall alerts than earlier research.
期刊介绍:
IJICT is a refereed journal in the field of information and communication technology (ICT), providing an international forum for professionals, engineers and researchers. IJICT reports the new paradigms in this emerging field of technology and envisions the future developments in the frontier areas. The journal addresses issues for the vertical and horizontal applications in this area. Topics covered include: -Information theory/coding- Information/IT/network security, standards, applications- Internet/web based systems/products- Data mining/warehousing- Network planning, design, administration- Sensor/ad hoc networks- Human-computer intelligent interaction, AI- Computational linguistics, digital speech- Distributed/cooperative media- Interactive communication media/content- Social interaction, mobile communications- Signal representation/processing, image processing- Virtual reality, cyber law, e-governance- Microprocessor interfacing, hardware design- Control of industrial processes, ERP/CRM/SCM