应用神经网络实现序列数据的分类精度

IF 4.9 Machine learning with applications Pub Date : 2025-03-01 Epub Date: 2024-12-18 DOI:10.1016/j.mlwa.2024.100611
Mamoru Mimura
{"title":"应用神经网络实现序列数据的分类精度","authors":"Mamoru Mimura","doi":"10.1016/j.mlwa.2024.100611","DOIUrl":null,"url":null,"abstract":"<div><div>Many existing studies on neural network accuracy utilize datasets that may not always reflect real-world conditions. While it has been demonstrated that accuracy tends to decrease as the number of benign samples increases, this effect has not been quantitatively assessed within neural networks. Moreover, its relevance to security tasks beyond malware classification remains unexplored. In this research, we refined the metric to evaluate the degradation of accuracy with an increased number of benign samples in test data. Utilizing both standard and specific neural network models, we conducted experiments to adapt this metric to neural networks and various feature extraction techniques. Using the FFRI dataset, comprising 150,000 malware and 400,000 benign samples, along with the URL dataset, containing 3143 malicious and 106,545,781 benign samples, we increased benign samples in the test set while keeping the training set’s malicious and benign samples constant. Our findings indicate that neural networks can indeed overestimate their accuracy with a smaller count of benign samples. Importantly, our refined metric is not only applicable to neural networks but is also effective for other feature extraction methods and security tasks beyond malware detection.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"19 ","pages":"Article 100611"},"PeriodicalIF":4.9000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Practical classification accuracy of sequential data using neural networks\",\"authors\":\"Mamoru Mimura\",\"doi\":\"10.1016/j.mlwa.2024.100611\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Many existing studies on neural network accuracy utilize datasets that may not always reflect real-world conditions. While it has been demonstrated that accuracy tends to decrease as the number of benign samples increases, this effect has not been quantitatively assessed within neural networks. Moreover, its relevance to security tasks beyond malware classification remains unexplored. In this research, we refined the metric to evaluate the degradation of accuracy with an increased number of benign samples in test data. Utilizing both standard and specific neural network models, we conducted experiments to adapt this metric to neural networks and various feature extraction techniques. Using the FFRI dataset, comprising 150,000 malware and 400,000 benign samples, along with the URL dataset, containing 3143 malicious and 106,545,781 benign samples, we increased benign samples in the test set while keeping the training set’s malicious and benign samples constant. Our findings indicate that neural networks can indeed overestimate their accuracy with a smaller count of benign samples. Importantly, our refined metric is not only applicable to neural networks but is also effective for other feature extraction methods and security tasks beyond malware detection.</div></div>\",\"PeriodicalId\":74093,\"journal\":{\"name\":\"Machine learning with applications\",\"volume\":\"19 \",\"pages\":\"Article 100611\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Machine learning with applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666827024000872\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/12/18 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine learning with applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666827024000872","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/12/18 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

许多关于神经网络准确性的现有研究使用的数据集可能并不总是反映现实世界的情况。虽然已经证明,随着良性样本数量的增加,准确性往往会降低,但这种影响尚未在神经网络中进行定量评估。此外,它与恶意软件分类之外的安全任务的相关性仍未得到探索。在本研究中,我们改进了度量来评估随着测试数据中良性样本数量的增加准确性的退化。利用标准和特定的神经网络模型,我们进行了实验,以使该度量适应神经网络和各种特征提取技术。使用包含150,000个恶意样本和400,000个良性样本的FFRI数据集,以及包含3143个恶意样本和106,545,781个良性样本的URL数据集,我们增加了测试集中的良性样本,同时保持训练集中的恶意样本和良性样本不变。我们的研究结果表明,神经网络确实可以在较少的良性样本数量下高估其准确性。重要的是,我们的改进指标不仅适用于神经网络,而且对其他特征提取方法和恶意软件检测以外的安全任务也有效。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Practical classification accuracy of sequential data using neural networks
Many existing studies on neural network accuracy utilize datasets that may not always reflect real-world conditions. While it has been demonstrated that accuracy tends to decrease as the number of benign samples increases, this effect has not been quantitatively assessed within neural networks. Moreover, its relevance to security tasks beyond malware classification remains unexplored. In this research, we refined the metric to evaluate the degradation of accuracy with an increased number of benign samples in test data. Utilizing both standard and specific neural network models, we conducted experiments to adapt this metric to neural networks and various feature extraction techniques. Using the FFRI dataset, comprising 150,000 malware and 400,000 benign samples, along with the URL dataset, containing 3143 malicious and 106,545,781 benign samples, we increased benign samples in the test set while keeping the training set’s malicious and benign samples constant. Our findings indicate that neural networks can indeed overestimate their accuracy with a smaller count of benign samples. Importantly, our refined metric is not only applicable to neural networks but is also effective for other feature extraction methods and security tasks beyond malware detection.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Machine learning with applications
Machine learning with applications Management Science and Operations Research, Artificial Intelligence, Computer Science Applications
自引率
0.00%
发文量
0
审稿时长
98 days
期刊最新文献
Quantum-inspired bi-level neuro-swarm optimization for UAV-based disaster recognition and response An unsupervised pipeline for class-agnostic object detection using self-supervised vision transformers and Kolmogorov–Arnold Networks Group-based learning on label-free phase-contrast images across dose and exposure time improves bioactive compound classification A deep reinforcement learning approach for emotion recognition from unaligned multimodal inputs Optimizing investment horizons: Machine learning applications in technical analysis of the WIG20 index
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1