Botnet DGA Domain Name Classification Using Transformer Network with Hybrid Embedding

IF 3.5 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Big Data Research Pub Date : 2023-08-28 DOI:10.1016/j.bdr.2023.100395

Ling Ding , Peng Du , Haiwei Hou , Jian Zhang , Di Jin , Shifei Ding

{"title":"Botnet DGA Domain Name Classification Using Transformer Network with Hybrid Embedding","authors":"Ling Ding , Peng Du , Haiwei Hou , Jian Zhang , Di Jin , Shifei Ding","doi":"10.1016/j.bdr.2023.100395","DOIUrl":null,"url":null,"abstract":"<div><p><span>One of the severest threats to cyber security is botnet, which typically uses domain names generated by Domain Generation Algorithms (DGAs) to communicate with their Command and Control (C&C) infrastructure. </span>DGA detection<span> and classification play an important role of assisting cyber security researchers to detect botnet C&C servers. However, many of the existing DGA detection models only focus on single scale word embedding<span> method, and very few models are specially designed to extract more effective features for DGA detection from multiple scales word embedding. To alleviate above questions, first we propose a hybrid word embedding method, which combines character level embedding and bigram level embedding to make full use of the domain names information, and then, we design a deep neural network with hybrid embedding method to distinguish DGA domains from known legitimate domains. Finally, we evaluate our hybrid embedding method and the proposed model on ONIST dataset and compare our methods with several state-of-the-art DGA classification methods.</span></span></p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"33 ","pages":"Article 100395"},"PeriodicalIF":3.5000,"publicationDate":"2023-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Big Data Research","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S221457962300028X","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

One of the severest threats to cyber security is botnet, which typically uses domain names generated by Domain Generation Algorithms (DGAs) to communicate with their Command and Control (C&C) infrastructure. DGA detection and classification play an important role of assisting cyber security researchers to detect botnet C&C servers. However, many of the existing DGA detection models only focus on single scale word embedding method, and very few models are specially designed to extract more effective features for DGA detection from multiple scales word embedding. To alleviate above questions, first we propose a hybrid word embedding method, which combines character level embedding and bigram level embedding to make full use of the domain names information, and then, we design a deep neural network with hybrid embedding method to distinguish DGA domains from known legitimate domains. Finally, we evaluate our hybrid embedding method and the proposed model on ONIST dataset and compare our methods with several state-of-the-art DGA classification methods.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于混合嵌入变压器网络的Botnet DGA域名分类

网络安全面临的最严重威胁之一是僵尸网络，它通常使用域生成算法（DGA）生成的域名与其指挥与控制（C&；C）基础设施进行通信。DGA检测和分类在协助网络安全研究人员检测僵尸网络C&；C服务器。然而，现有的DGA检测模型大多只关注单尺度词嵌入方法，很少有模型专门设计用于从多尺度词嵌入中提取更有效的DGA特征。为了缓解上述问题，我们首先提出了一种混合单词嵌入方法，该方法将字符级嵌入和双字符级嵌入相结合，以充分利用域名信息，然后，我们设计了一种具有混合嵌入方法的深度神经网络，以区分DGA域和已知合法域。最后，我们在ONIST数据集上评估了我们的混合嵌入方法和所提出的模型，并将我们的方法与几种最先进的DGA分类方法进行了比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Big Data Research Computer Science-Computer Science Applications

CiteScore

8.40

自引率

3.00%

发文量

期刊介绍： The journal aims to promote and communicate advances in big data research by providing a fast and high quality forum for researchers, practitioners and policy makers from the very many different communities working on, and with, this topic. The journal will accept papers on foundational aspects in dealing with big data, as well as papers on specific Platforms and Technologies used to deal with big data. To promote Data Science and interdisciplinary collaboration between fields, and to showcase the benefits of data driven research, papers demonstrating applications of big data in domains as diverse as Geoscience, Social Web, Finance, e-Commerce, Health Care, Environment and Climate, Physics and Astronomy, Chemistry, life sciences and drug discovery, digital libraries and scientific publications, security and government will also be considered. Occasionally the journal may publish whitepapers on policies, standards and best practices.

期刊最新文献

Modeling meaningful volatility events to classify monetary policy announcements Predicting option prices: From the Black-Scholes model to machine learning methods Editorial Board Efficient training: Federated learning cost analysis Improved Tesseract optical character recognition performance on Thai document datasets