概率数据结构在数据丢失防护系统中过滤任务的适用性

Lu Shi, S. Butakov, Dale Lindskog, Ron Ruhl, Evgeny Storozhenko
{"title":"概率数据结构在数据丢失防护系统中过滤任务的适用性","authors":"Lu Shi, S. Butakov, Dale Lindskog, Ron Ruhl, Evgeny Storozhenko","doi":"10.1109/WAINA.2015.47","DOIUrl":null,"url":null,"abstract":"The paper studies the applicability of a probabilistic data structure known as Bloom Filter (BF) in the content analysis component of Data Loss Prevention (DLP) Systems. The study shows that Bus may serve as preliminary selection mechanism in content analysis. The goal of such mechanism is to quickly pre-select documents that may be similar to the one being checked. This selection should be accompanied by more detailed comparison to cope with false positive results produced by BFs. Specialized form of the filter called Matrix BF has been found particularly helpful for the content analysis task as it provides search localization and allows the filter to grow along with the document database and maintain liner search time. The paper outlined theoretical threshold for false positives for comparison of two rows in the Matrix BF. The threshold was confirmed by experiments. The experiments also indicated acceptable performance in terms of computational performance and level of false positives. Tests with obfuscated texts revealed some limitations of the proposed approach.","PeriodicalId":6845,"journal":{"name":"2015 IEEE 29th International Conference on Advanced Information Networking and Applications Workshops","volume":"262 1","pages":"582-586"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Applicability of Probablistic Data Structures for Filtering Tasks in Data Loss Prevention Systems\",\"authors\":\"Lu Shi, S. Butakov, Dale Lindskog, Ron Ruhl, Evgeny Storozhenko\",\"doi\":\"10.1109/WAINA.2015.47\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The paper studies the applicability of a probabilistic data structure known as Bloom Filter (BF) in the content analysis component of Data Loss Prevention (DLP) Systems. The study shows that Bus may serve as preliminary selection mechanism in content analysis. The goal of such mechanism is to quickly pre-select documents that may be similar to the one being checked. This selection should be accompanied by more detailed comparison to cope with false positive results produced by BFs. Specialized form of the filter called Matrix BF has been found particularly helpful for the content analysis task as it provides search localization and allows the filter to grow along with the document database and maintain liner search time. The paper outlined theoretical threshold for false positives for comparison of two rows in the Matrix BF. The threshold was confirmed by experiments. The experiments also indicated acceptable performance in terms of computational performance and level of false positives. Tests with obfuscated texts revealed some limitations of the proposed approach.\",\"PeriodicalId\":6845,\"journal\":{\"name\":\"2015 IEEE 29th International Conference on Advanced Information Networking and Applications Workshops\",\"volume\":\"262 1\",\"pages\":\"582-586\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-03-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE 29th International Conference on Advanced Information Networking and Applications Workshops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WAINA.2015.47\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 29th International Conference on Advanced Information Networking and Applications Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WAINA.2015.47","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

本文研究了一种概率数据结构布隆滤波器(BF)在数据丢失预防系统(DLP)内容分析组件中的适用性。研究表明,Bus可以作为内容分析的初步选择机制。这种机制的目标是快速预先选择可能与被检查的文档相似的文档。这种选择应该伴随着更详细的比较,以应对BFs产生的假阳性结果。被称为Matrix BF的过滤器的特殊形式被发现对内容分析任务特别有帮助,因为它提供了搜索本地化,并允许过滤器随着文档数据库的增长而增长,并保持线性搜索时间。本文概述了在矩阵BF中比较两行误报的理论阈值。实验证实了该阈值。在计算性能和误报水平方面,实验也表明了可接受的性能。使用混淆文本进行的测试揭示了所建议方法的一些局限性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Applicability of Probablistic Data Structures for Filtering Tasks in Data Loss Prevention Systems
The paper studies the applicability of a probabilistic data structure known as Bloom Filter (BF) in the content analysis component of Data Loss Prevention (DLP) Systems. The study shows that Bus may serve as preliminary selection mechanism in content analysis. The goal of such mechanism is to quickly pre-select documents that may be similar to the one being checked. This selection should be accompanied by more detailed comparison to cope with false positive results produced by BFs. Specialized form of the filter called Matrix BF has been found particularly helpful for the content analysis task as it provides search localization and allows the filter to grow along with the document database and maintain liner search time. The paper outlined theoretical threshold for false positives for comparison of two rows in the Matrix BF. The threshold was confirmed by experiments. The experiments also indicated acceptable performance in terms of computational performance and level of false positives. Tests with obfuscated texts revealed some limitations of the proposed approach.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Performance Analysis of WMN-GA Simulation System for Different WMN Architectures Considering OLSR A Network Topology Visualization System Based on Mobile AR Technology A Framework for Security Services Based on Software-Defined Networking Extended Lifetime Based Elliptical Sink-Mobility in Depth Based Routing Protocol for UWSNs A Proposal and Implementation of an ID Federation that Conceals a Web Service from an Authentication Server
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1