{"title":"From Real Malicious Domains to Possible False Positives in DGA Domain Detection","authors":"Haleh Shahzad, A. Sattar, Janahan Skandaraniyam","doi":"10.1109/ICCRD51685.2021.9386658","DOIUrl":null,"url":null,"abstract":"Various families of malware use domain generation algorithms (DGAs) to generate a large number of pseudo-random domain names to connect to malicious command and control servers (C&Cs). These domain names are used to evade domain based security detection and mitigation controls such as firewall controls. Existing prevalent techniques to detect DGA domains such as reverse engineering malware samples and statistical analysis techniques are time consuming, can be easily circumvented by attackers, and need contextual information which is not easily or feasibly obtained. Due to this, the use of machine learning and deep learning techniques to detect DGA domains has picked up significant interest in the cyber security and analytics communities. The ultimate goal is to detect DGA domains on a per domain basis using the domain name only, with no additional information. As with all techniques, there is the possibility of false positives: valid domains being detected as DGA domains. This paper explores the possible use cases that can result in false positives for DGA domain detection using machine learning and deep learning techniques, and how such use cases, if not uniquely addressed within an automated system or model or technique, can also be used as attack vectors by attackers using DGA domains.","PeriodicalId":294200,"journal":{"name":"2021 IEEE 13th International Conference on Computer Research and Development (ICCRD)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 13th International Conference on Computer Research and Development (ICCRD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCRD51685.2021.9386658","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Various families of malware use domain generation algorithms (DGAs) to generate a large number of pseudo-random domain names to connect to malicious command and control servers (C&Cs). These domain names are used to evade domain based security detection and mitigation controls such as firewall controls. Existing prevalent techniques to detect DGA domains such as reverse engineering malware samples and statistical analysis techniques are time consuming, can be easily circumvented by attackers, and need contextual information which is not easily or feasibly obtained. Due to this, the use of machine learning and deep learning techniques to detect DGA domains has picked up significant interest in the cyber security and analytics communities. The ultimate goal is to detect DGA domains on a per domain basis using the domain name only, with no additional information. As with all techniques, there is the possibility of false positives: valid domains being detected as DGA domains. This paper explores the possible use cases that can result in false positives for DGA domain detection using machine learning and deep learning techniques, and how such use cases, if not uniquely addressed within an automated system or model or technique, can also be used as attack vectors by attackers using DGA domains.