应用神经网络实现序列数据的分类精度

IF 4.9 Machine learning with applications Pub Date : 2025-03-01 Epub Date: 2024-12-18 DOI:10.1016/j.mlwa.2024.100611

Mamoru Mimura

{"title":"应用神经网络实现序列数据的分类精度","authors":"Mamoru Mimura","doi":"10.1016/j.mlwa.2024.100611","DOIUrl":null,"url":null,"abstract":"<div><div>Many existing studies on neural network accuracy utilize datasets that may not always reflect real-world conditions. While it has been demonstrated that accuracy tends to decrease as the number of benign samples increases, this effect has not been quantitatively assessed within neural networks. Moreover, its relevance to security tasks beyond malware classification remains unexplored. In this research, we refined the metric to evaluate the degradation of accuracy with an increased number of benign samples in test data. Utilizing both standard and specific neural network models, we conducted experiments to adapt this metric to neural networks and various feature extraction techniques. Using the FFRI dataset, comprising 150,000 malware and 400,000 benign samples, along with the URL dataset, containing 3143 malicious and 106,545,781 benign samples, we increased benign samples in the test set while keeping the training set’s malicious and benign samples constant. Our findings indicate that neural networks can indeed overestimate their accuracy with a smaller count of benign samples. Importantly, our refined metric is not only applicable to neural networks but is also effective for other feature extraction methods and security tasks beyond malware detection.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"19 ","pages":"Article 100611"},"PeriodicalIF":4.9000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Practical classification accuracy of sequential data using neural networks\",\"authors\":\"Mamoru Mimura\",\"doi\":\"10.1016/j.mlwa.2024.100611\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Many existing studies on neural network accuracy utilize datasets that may not always reflect real-world conditions. While it has been demonstrated that accuracy tends to decrease as the number of benign samples increases, this effect has not been quantitatively assessed within neural networks. Moreover, its relevance to security tasks beyond malware classification remains unexplored. In this research, we refined the metric to evaluate the degradation of accuracy with an increased number of benign samples in test data. Utilizing both standard and specific neural network models, we conducted experiments to adapt this metric to neural networks and various feature extraction techniques. Using the FFRI dataset, comprising 150,000 malware and 400,000 benign samples, along with the URL dataset, containing 3143 malicious and 106,545,781 benign samples, we increased benign samples in the test set while keeping the training set’s malicious and benign samples constant. Our findings indicate that neural networks can indeed overestimate their accuracy with a smaller count of benign samples. Importantly, our refined metric is not only applicable to neural networks but is also effective for other feature extraction methods and security tasks beyond malware detection.</div></div>\",\"PeriodicalId\":74093,\"journal\":{\"name\":\"Machine learning with applications\",\"volume\":\"19 \",\"pages\":\"Article 100611\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Machine learning with applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666827024000872\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/12/18 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine learning with applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666827024000872","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/12/18 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

许多关于神经网络准确性的现有研究使用的数据集可能并不总是反映现实世界的情况。虽然已经证明，随着良性样本数量的增加，准确性往往会降低，但这种影响尚未在神经网络中进行定量评估。此外，它与恶意软件分类之外的安全任务的相关性仍未得到探索。在本研究中，我们改进了度量来评估随着测试数据中良性样本数量的增加准确性的退化。利用标准和特定的神经网络模型，我们进行了实验，以使该度量适应神经网络和各种特征提取技术。使用包含150,000个恶意样本和400,000个良性样本的FFRI数据集，以及包含3143个恶意样本和106,545,781个良性样本的URL数据集，我们增加了测试集中的良性样本，同时保持训练集中的恶意样本和良性样本不变。我们的研究结果表明，神经网络确实可以在较少的良性样本数量下高估其准确性。重要的是，我们的改进指标不仅适用于神经网络，而且对其他特征提取方法和恶意软件检测以外的安全任务也有效。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Practical classification accuracy of sequential data using neural networks

Many existing studies on neural network accuracy utilize datasets that may not always reflect real-world conditions. While it has been demonstrated that accuracy tends to decrease as the number of benign samples increases, this effect has not been quantitatively assessed within neural networks. Moreover, its relevance to security tasks beyond malware classification remains unexplored. In this research, we refined the metric to evaluate the degradation of accuracy with an increased number of benign samples in test data. Utilizing both standard and specific neural network models, we conducted experiments to adapt this metric to neural networks and various feature extraction techniques. Using the FFRI dataset, comprising 150,000 malware and 400,000 benign samples, along with the URL dataset, containing 3143 malicious and 106,545,781 benign samples, we increased benign samples in the test set while keeping the training set’s malicious and benign samples constant. Our findings indicate that neural networks can indeed overestimate their accuracy with a smaller count of benign samples. Importantly, our refined metric is not only applicable to neural networks but is also effective for other feature extraction methods and security tasks beyond malware detection.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Machine learning with applications Management Science and Operations Research, Artificial Intelligence, Computer Science Applications

自引率

0.00%

发文量

审稿时长

98 days