Malicious PDF detection Based on Machine Learning with Enhanced Feature Set

S. Yerima, A. Bashar, Ghazanfar Latif
{"title":"Malicious PDF detection Based on Machine Learning with Enhanced Feature Set","authors":"S. Yerima, A. Bashar, Ghazanfar Latif","doi":"10.1109/CICN56167.2022.10008374","DOIUrl":null,"url":null,"abstract":"PDF is one of the most popular document file formats due to its flexibility, platform independence and ability to embed different types of content. Over the years, PDF has become a popular attack vector for spreading malware and compromising computer systems. Existing signature-based defense systems have extremely high recall rates, but quickly become obsolete and ineffective against zero-day attacks, which makes them easy to circumvent by malicious PDF files. Recently, Machine Learning (ML) has emerged as a viable tool to improve discovery of previously unseen attacks. Hence, in this paper we present enhanced ML-based models for the detection of malicious PDF documents. We develop an approach for ML-based detection with static features derived from PDF documents leveraging existing tools and propose new, previously unused features to enhance the performance of the ML-based classifiers. Our investigative study is conducted on the recently published Evasive-PDFMal2022 dataset, which was used to evaluate seven ML classifiers based on our proposed method. The EvasivePDFMal2022 dataset consists of 4,468 benign samples and 5,557 malicious PDF samples. The results of the experiments show that our proposed approach with the enhanced features enabled improved accuracies in five out of seven of the classifiers that were evaluated. The results demonstrate the potential of the new features to increase the robustness of feature-based PDF malware detection.","PeriodicalId":287589,"journal":{"name":"2022 14th International Conference on Computational Intelligence and Communication Networks (CICN)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 14th International Conference on Computational Intelligence and Communication Networks (CICN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CICN56167.2022.10008374","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

PDF is one of the most popular document file formats due to its flexibility, platform independence and ability to embed different types of content. Over the years, PDF has become a popular attack vector for spreading malware and compromising computer systems. Existing signature-based defense systems have extremely high recall rates, but quickly become obsolete and ineffective against zero-day attacks, which makes them easy to circumvent by malicious PDF files. Recently, Machine Learning (ML) has emerged as a viable tool to improve discovery of previously unseen attacks. Hence, in this paper we present enhanced ML-based models for the detection of malicious PDF documents. We develop an approach for ML-based detection with static features derived from PDF documents leveraging existing tools and propose new, previously unused features to enhance the performance of the ML-based classifiers. Our investigative study is conducted on the recently published Evasive-PDFMal2022 dataset, which was used to evaluate seven ML classifiers based on our proposed method. The EvasivePDFMal2022 dataset consists of 4,468 benign samples and 5,557 malicious PDF samples. The results of the experiments show that our proposed approach with the enhanced features enabled improved accuracies in five out of seven of the classifiers that were evaluated. The results demonstrate the potential of the new features to increase the robustness of feature-based PDF malware detection.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于增强特征集的机器学习恶意PDF检测
由于其灵活性、平台独立性和嵌入不同类型内容的能力,PDF是最流行的文档文件格式之一。多年来,PDF已成为传播恶意软件和危及计算机系统的流行攻击媒介。现有的基于签名的防御系统具有极高的召回率,但对于零日攻击很快就会过时和无效,这使得它们很容易被恶意PDF文件绕过。最近,机器学习(ML)已经成为一种可行的工具,可以改进以前看不见的攻击的发现。因此,在本文中,我们提出了用于检测恶意PDF文档的增强的基于ml的模型。我们利用现有工具开发了一种基于ml的检测方法,使用源自PDF文档的静态特征,并提出了以前未使用的新特征,以增强基于ml的分类器的性能。我们的调查研究是在最近发表的Evasive-PDFMal2022数据集上进行的,该数据集用于评估基于我们提出的方法的七个ML分类器。EvasivePDFMal2022数据集由4,468个良性样本和5,557个恶意PDF样本组成。实验结果表明,我们提出的具有增强特征的方法在七个被评估的分类器中有五个提高了准确性。结果证明了新特征在提高基于特征的PDF恶意软件检测的鲁棒性方面的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Prediction of Downhole Pressure while Tripping A Parallelized Genetic Algorithms approach to Community Energy Systems Planning Application of Artificial Neural Network to Estimate Students Performance in Scholastic Assessment Test A New Intelligent System for Evaluating and Assisting Students in Laboratory Learning Management System Performance Evaluation of Machine Learning Models on Apache Spark: An Empirical Study
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1