利用连续词袋和随机森林检测垃圾邮件

Michiavelly Rustam, Agung Brotokuncoro, Rusdianto Roestam
{"title":"利用连续词袋和随机森林检测垃圾邮件","authors":"Michiavelly Rustam, Agung Brotokuncoro, Rusdianto Roestam","doi":"10.38035/rrj.v6i4.873","DOIUrl":null,"url":null,"abstract":"Spam email poses a significant cyber threat, as scammers employ various tactics to deceive individuals into divulging sensitive information or downloading harmful content. For instance, in June 2023, Indonesia encountered approximately 6.51 thousand spam attacks, underscoring the widespread nature of this issue. These attacks frequently involve deceptive strategies, such as impersonation or false promises of rewards, to ensnare unsuspecting victims. Succumbing to spam can result in financial losses and other grave repercussions. To address this concern, this research addresses this pressing problem by focusing on email content classification to detect phishing attempts. The proposed solution leverages runtime platforms such as Google Colab and uses Continuous Bag of Words (CBOW) analysis and Random Forest methods. CBOW is selected for its effectiveness in capturing semantic relationships between words, allowing the model to extract meaningful features from the email content. Random Forest, on the other hand, is chosen for its ability to handle imbalanced datasets commonly encountered in email classification tasks, ensuring fair representation of both spam and ham emails during model training. By combining these two techniques, we aim to develop a robust classification model capable of accurately distinguishing between phishing (spam) and legitimate (ham) emails, thus enhancing email security measures. Through our approach, we aim to classify the SpamAssassin dataset into ham or spam categories, with an anticipated precision rate of 0.98, demonstrating the model's effectiveness in accurately identifying phishing emails.","PeriodicalId":333433,"journal":{"name":"Ranah Research : Journal of Multidisciplinary Research and Development","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deteksi Email Spam dengan Continuous Bag-Of-Words dan Random Forest\",\"authors\":\"Michiavelly Rustam, Agung Brotokuncoro, Rusdianto Roestam\",\"doi\":\"10.38035/rrj.v6i4.873\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Spam email poses a significant cyber threat, as scammers employ various tactics to deceive individuals into divulging sensitive information or downloading harmful content. For instance, in June 2023, Indonesia encountered approximately 6.51 thousand spam attacks, underscoring the widespread nature of this issue. These attacks frequently involve deceptive strategies, such as impersonation or false promises of rewards, to ensnare unsuspecting victims. Succumbing to spam can result in financial losses and other grave repercussions. To address this concern, this research addresses this pressing problem by focusing on email content classification to detect phishing attempts. The proposed solution leverages runtime platforms such as Google Colab and uses Continuous Bag of Words (CBOW) analysis and Random Forest methods. CBOW is selected for its effectiveness in capturing semantic relationships between words, allowing the model to extract meaningful features from the email content. Random Forest, on the other hand, is chosen for its ability to handle imbalanced datasets commonly encountered in email classification tasks, ensuring fair representation of both spam and ham emails during model training. By combining these two techniques, we aim to develop a robust classification model capable of accurately distinguishing between phishing (spam) and legitimate (ham) emails, thus enhancing email security measures. Through our approach, we aim to classify the SpamAssassin dataset into ham or spam categories, with an anticipated precision rate of 0.98, demonstrating the model's effectiveness in accurately identifying phishing emails.\",\"PeriodicalId\":333433,\"journal\":{\"name\":\"Ranah Research : Journal of Multidisciplinary Research and Development\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ranah Research : Journal of Multidisciplinary Research and Development\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.38035/rrj.v6i4.873\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ranah Research : Journal of Multidisciplinary Research and Development","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.38035/rrj.v6i4.873","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

垃圾电子邮件构成了严重的网络威胁,因为骗子会使用各种手段欺骗个人泄露敏感信息或下载有害内容。例如,2023 年 6 月,印尼遭遇了约 651 万次垃圾邮件攻击,凸显了这一问题的广泛性。这些攻击经常采用欺骗策略,如冒充或虚假奖励承诺,诱骗毫无戒心的受害者。屈服于垃圾邮件可能会导致经济损失和其他严重后果。为了解决这一问题,本研究通过对电子邮件内容进行分类来检测网络钓鱼企图,从而解决这一紧迫问题。所提出的解决方案利用了运行时平台(如 Google Colab),并使用了连续词袋(CBOW)分析和随机森林方法。之所以选择 CBOW,是因为它能有效捕捉词与词之间的语义关系,使模型能从电子邮件内容中提取有意义的特征。另一方面,选择随机森林是因为它能够处理电子邮件分类任务中常见的不平衡数据集,确保在模型训练过程中公平地代表垃圾邮件和火腿邮件。通过将这两种技术相结合,我们旨在开发一种强大的分类模型,能够准确区分网络钓鱼(垃圾邮件)和合法(垃圾邮件)电子邮件,从而加强电子邮件安全措施。通过我们的方法,我们的目标是将 SpamAssassin 数据集分为火腿或垃圾邮件类别,预期精确率为 0.98,从而证明该模型在准确识别网络钓鱼电子邮件方面的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Deteksi Email Spam dengan Continuous Bag-Of-Words dan Random Forest
Spam email poses a significant cyber threat, as scammers employ various tactics to deceive individuals into divulging sensitive information or downloading harmful content. For instance, in June 2023, Indonesia encountered approximately 6.51 thousand spam attacks, underscoring the widespread nature of this issue. These attacks frequently involve deceptive strategies, such as impersonation or false promises of rewards, to ensnare unsuspecting victims. Succumbing to spam can result in financial losses and other grave repercussions. To address this concern, this research addresses this pressing problem by focusing on email content classification to detect phishing attempts. The proposed solution leverages runtime platforms such as Google Colab and uses Continuous Bag of Words (CBOW) analysis and Random Forest methods. CBOW is selected for its effectiveness in capturing semantic relationships between words, allowing the model to extract meaningful features from the email content. Random Forest, on the other hand, is chosen for its ability to handle imbalanced datasets commonly encountered in email classification tasks, ensuring fair representation of both spam and ham emails during model training. By combining these two techniques, we aim to develop a robust classification model capable of accurately distinguishing between phishing (spam) and legitimate (ham) emails, thus enhancing email security measures. Through our approach, we aim to classify the SpamAssassin dataset into ham or spam categories, with an anticipated precision rate of 0.98, demonstrating the model's effectiveness in accurately identifying phishing emails.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Analisis Pengukuran Kinerja Keuangan Berbasis Corporate Social Responbility : Studi Kasus pada PT. Nippon Indosari Corpindo. Tbk (Periode 2018 – 2022) Pengaruh Teknologi Blockchain Terhadap Kepercayaan Investor dalam Pengambilan Keputusan Investasi Manajemen Safari Dakwah Keluarga Besar Mustahafawiyah Aek Nangali Sekitar Di Kecamatan Batang Natal Kabupaten Mandailing Natal Mekanisme Pertanggungjawaban Anggaran Pendapatan dan Belanja Daerah Pertanggung Jawaban Hukum Terhadap Ketidaksesuaian Informasi dalam Perjanjian Kredit oleh Pelaku Usaha Jasa Keuangan
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1