非平衡表安全数据的半监督方法

IF 0.9 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS Journal of Computer Security Pub Date : 2023-11-10 DOI:10.3233/jcs-220130
Xiaodi Li, Latifur Khan, Mahmoud Zamani, Shamila Wickramasuriya, Kevin Hamlen, Bhavani Thuraisingham
{"title":"非平衡表安全数据的半监督方法","authors":"Xiaodi Li, Latifur Khan, Mahmoud Zamani, Shamila Wickramasuriya, Kevin Hamlen, Bhavani Thuraisingham","doi":"10.3233/jcs-220130","DOIUrl":null,"url":null,"abstract":"Con2Mix (Contrastive Double Mixup) is a new semi-supervised learning methodology that innovates a triplet mixup data augmentation approach for finding code vulnerabilities in imbalanced, tabular security data sets. Tabular data sets in cybersecurity domains are widely known to pose challenges for machine learning because of their heavily imbalanced data (e.g., a small number of labeled attack samples buried in a sea of mostly benign, unlabeled data). Semi-supervised learning leverages a small subset of labeled data and a large subset of unlabeled data to train a learning model. While semi-supervised methods have been well studied in image and language domains, in security domains they remain underutilized, especially on tabular security data sets which pose especially difficult contextual information loss and balance challenges for machine learning. Experiments applying Con2Mix to collected security data sets show promise for addressing these challenges, achieving state-of-the-art performance on two evaluated data sets compared with other methods.","PeriodicalId":46074,"journal":{"name":"Journal of Computer Security","volume":null,"pages":null},"PeriodicalIF":0.9000,"publicationDate":"2023-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Con2Mix: A semi-supervised method for imbalanced tabular security data1\",\"authors\":\"Xiaodi Li, Latifur Khan, Mahmoud Zamani, Shamila Wickramasuriya, Kevin Hamlen, Bhavani Thuraisingham\",\"doi\":\"10.3233/jcs-220130\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Con2Mix (Contrastive Double Mixup) is a new semi-supervised learning methodology that innovates a triplet mixup data augmentation approach for finding code vulnerabilities in imbalanced, tabular security data sets. Tabular data sets in cybersecurity domains are widely known to pose challenges for machine learning because of their heavily imbalanced data (e.g., a small number of labeled attack samples buried in a sea of mostly benign, unlabeled data). Semi-supervised learning leverages a small subset of labeled data and a large subset of unlabeled data to train a learning model. While semi-supervised methods have been well studied in image and language domains, in security domains they remain underutilized, especially on tabular security data sets which pose especially difficult contextual information loss and balance challenges for machine learning. Experiments applying Con2Mix to collected security data sets show promise for addressing these challenges, achieving state-of-the-art performance on two evaluated data sets compared with other methods.\",\"PeriodicalId\":46074,\"journal\":{\"name\":\"Journal of Computer Security\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.9000,\"publicationDate\":\"2023-11-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Computer Security\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3233/jcs-220130\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computer Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/jcs-220130","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

Con2Mix(对比双重混合)是一种新的半监督学习方法,它创新了一种三重混合数据增强方法,用于在不平衡的表格安全数据集中发现代码漏洞。众所周知,网络安全领域的表格数据集对机器学习构成挑战,因为它们的数据严重不平衡(例如,少量标记的攻击样本被埋在大多数良性的、未标记的数据中)。半监督学习利用一小部分标记数据和大量未标记数据来训练学习模型。虽然半监督方法已经在图像和语言领域得到了很好的研究,但在安全领域,它们仍然没有得到充分利用,特别是在表格安全数据集上,这给机器学习带来了特别困难的上下文信息丢失和平衡挑战。将Con2Mix应用于收集的安全数据集的实验表明,与其他方法相比,Con2Mix在两个评估数据集上实现了最先进的性能,有望解决这些挑战。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Con2Mix: A semi-supervised method for imbalanced tabular security data1
Con2Mix (Contrastive Double Mixup) is a new semi-supervised learning methodology that innovates a triplet mixup data augmentation approach for finding code vulnerabilities in imbalanced, tabular security data sets. Tabular data sets in cybersecurity domains are widely known to pose challenges for machine learning because of their heavily imbalanced data (e.g., a small number of labeled attack samples buried in a sea of mostly benign, unlabeled data). Semi-supervised learning leverages a small subset of labeled data and a large subset of unlabeled data to train a learning model. While semi-supervised methods have been well studied in image and language domains, in security domains they remain underutilized, especially on tabular security data sets which pose especially difficult contextual information loss and balance challenges for machine learning. Experiments applying Con2Mix to collected security data sets show promise for addressing these challenges, achieving state-of-the-art performance on two evaluated data sets compared with other methods.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Computer Security
Journal of Computer Security COMPUTER SCIENCE, INFORMATION SYSTEMS-
CiteScore
1.70
自引率
0.00%
发文量
35
期刊介绍: The Journal of Computer Security presents research and development results of lasting significance in the theory, design, implementation, analysis, and application of secure computer systems and networks. It will also provide a forum for ideas about the meaning and implications of security and privacy, particularly those with important consequences for the technical community. The Journal provides an opportunity to publish articles of greater depth and length than is possible in the proceedings of various existing conferences, while addressing an audience of researchers in computer security who can be assumed to have a more specialized background than the readership of other archival publications.
期刊最新文献
Adaptive multi-cascaded ResNet-based efficient multimedia steganography framework using hybrid mouth brooding fish-emperor penguin optimization mechanism Securing Images using Bifid Cipher associated with Arnold Map Identity-based chameleon hash from lattices Practical multi-party private set intersection cardinality and intersection-sum protocols under arbitrary collusion1 MVDet: Encrypted malware traffic detection via multi-view analysis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1