数据语义检测:一种防止数据泄漏的方法

Sultan Alneyadi, E. Sithirasenan, V. Muthukkumarasamy
{"title":"数据语义检测:一种防止数据泄漏的方法","authors":"Sultan Alneyadi, E. Sithirasenan, V. Muthukkumarasamy","doi":"10.1109/Trustcom.2015.464","DOIUrl":null,"url":null,"abstract":"Data leakage prevention systems (DLPSs) are increasingly being implemented by organizations. Unlike standard security mechanisms such as firewalls and intrusion detection systems, DLPSs are designated systems used to protect in use, at rest and in transit data. DLPSs analytically use the content and surrounding context of confidential data to detect and prevent unauthorized access to confidential data. DLPSs that use content analysis techniques are largely dependent upon data fingerprinting, regular expressions, and statistical analysis to detect data leaks. Given that data is susceptible to change, data fingerprinting and regular expressions suffer from shortcomings in detecting the semantics of evolved confidential data. However, statistical analysis can manage any data that appears fuzzy in nature or has other variations. Thus, DLPSs with statistical analysis capabilities can approximate the presence of data semantics. In this paper, a statistical data leakage prevention (DLP) model is presented to classify data on the basis of semantics. This study contributes to the data leakage prevention field by using data statistical analysis to detect evolved confidential data. The approach was based on using the well-known information retrieval function Term Frequency-Inverse Document Frequency (TF-IDF) to classify documents under certain topics. A Singular Value Decomposition (SVD) matrix was also used to visualize the classification results. The results showed that the proposed statistical DLP approach could correctly classify documents even in cases of extreme modification. It also had a high level of precision and recall scores.","PeriodicalId":277092,"journal":{"name":"2015 IEEE Trustcom/BigDataSE/ISPA","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"41","resultStr":"{\"title\":\"Detecting Data Semantic: A Data Leakage Prevention Approach\",\"authors\":\"Sultan Alneyadi, E. Sithirasenan, V. Muthukkumarasamy\",\"doi\":\"10.1109/Trustcom.2015.464\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data leakage prevention systems (DLPSs) are increasingly being implemented by organizations. Unlike standard security mechanisms such as firewalls and intrusion detection systems, DLPSs are designated systems used to protect in use, at rest and in transit data. DLPSs analytically use the content and surrounding context of confidential data to detect and prevent unauthorized access to confidential data. DLPSs that use content analysis techniques are largely dependent upon data fingerprinting, regular expressions, and statistical analysis to detect data leaks. Given that data is susceptible to change, data fingerprinting and regular expressions suffer from shortcomings in detecting the semantics of evolved confidential data. However, statistical analysis can manage any data that appears fuzzy in nature or has other variations. Thus, DLPSs with statistical analysis capabilities can approximate the presence of data semantics. In this paper, a statistical data leakage prevention (DLP) model is presented to classify data on the basis of semantics. This study contributes to the data leakage prevention field by using data statistical analysis to detect evolved confidential data. The approach was based on using the well-known information retrieval function Term Frequency-Inverse Document Frequency (TF-IDF) to classify documents under certain topics. A Singular Value Decomposition (SVD) matrix was also used to visualize the classification results. The results showed that the proposed statistical DLP approach could correctly classify documents even in cases of extreme modification. It also had a high level of precision and recall scores.\",\"PeriodicalId\":277092,\"journal\":{\"name\":\"2015 IEEE Trustcom/BigDataSE/ISPA\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-08-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"41\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE Trustcom/BigDataSE/ISPA\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/Trustcom.2015.464\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE Trustcom/BigDataSE/ISPA","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/Trustcom.2015.464","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 41

摘要

数据泄漏预防系统(dlps)越来越多地被组织所采用。与防火墙和入侵检测系统等标准安全机制不同,dlps是用于保护使用中、静态和传输中的数据的指定系统。dlps分析地使用机密数据的内容和周围上下文来检测和防止对机密数据的未经授权的访问。使用内容分析技术的dlps在很大程度上依赖于数据指纹、正则表达式和统计分析来检测数据泄漏。由于数据容易发生变化,数据指纹和正则表达式在检测演化的机密数据的语义方面存在缺陷。然而,统计分析可以管理本质上看起来模糊或有其他变化的任何数据。因此,具有统计分析功能的dlps可以近似地表示数据语义的存在。本文提出了一种基于语义的统计数据泄漏预防(DLP)模型。本研究通过数据统计分析来检测演变的机密数据,为数据泄漏预防领域做出了贡献。该方法基于使用著名的信息检索函数术语频率-逆文档频率(TF-IDF)对特定主题下的文档进行分类。采用奇异值分解(SVD)矩阵对分类结果进行可视化处理。结果表明,即使在极端修改的情况下,统计DLP方法也能正确地对文档进行分类。它也有很高的精确度和回忆分数。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Detecting Data Semantic: A Data Leakage Prevention Approach
Data leakage prevention systems (DLPSs) are increasingly being implemented by organizations. Unlike standard security mechanisms such as firewalls and intrusion detection systems, DLPSs are designated systems used to protect in use, at rest and in transit data. DLPSs analytically use the content and surrounding context of confidential data to detect and prevent unauthorized access to confidential data. DLPSs that use content analysis techniques are largely dependent upon data fingerprinting, regular expressions, and statistical analysis to detect data leaks. Given that data is susceptible to change, data fingerprinting and regular expressions suffer from shortcomings in detecting the semantics of evolved confidential data. However, statistical analysis can manage any data that appears fuzzy in nature or has other variations. Thus, DLPSs with statistical analysis capabilities can approximate the presence of data semantics. In this paper, a statistical data leakage prevention (DLP) model is presented to classify data on the basis of semantics. This study contributes to the data leakage prevention field by using data statistical analysis to detect evolved confidential data. The approach was based on using the well-known information retrieval function Term Frequency-Inverse Document Frequency (TF-IDF) to classify documents under certain topics. A Singular Value Decomposition (SVD) matrix was also used to visualize the classification results. The results showed that the proposed statistical DLP approach could correctly classify documents even in cases of extreme modification. It also had a high level of precision and recall scores.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Novel Sensor Deployment Approach Using Fruit Fly Optimization Algorithm in Wireless Sensor Networks Study on the Coverage of Adaptive Wireless Sensor Network Based on Trust A Security Topology Protocol of Wireless Sensor Networks Based on Community Detection and Energy Aware WAVE: Secure Wireless Pairing Exploiting Human Body Movements Quantitative Trustworthy Evaluation Scheme for Trust Routing Scheme in Wireless Sensor Networks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1