{"title":"Semantic Similarity Detection For Data Leak Prevention","authors":"Dan Du, Lu Yu, R. Brooks","doi":"10.1145/2746266.2746270","DOIUrl":null,"url":null,"abstract":"To counter data breaches, we introduce a new data leak prevention (DLP) approach. Unlike regular expression methods, our approach extracts a small number of critical semantic features and requires a small training set. Existing tools concentrate mostly on data format where most defense and industry applications would be better served by monitoring the semantics of information in the enterprise. We demonstrate our approach by comparing its performance with other state-of-the-art methods, such as latent dirichlet allocation (LDA) and support vector machine (SVM). The experiment results suggest that the proposed approach have superior accuracy in terms of detection rate and false-positive (FP) rate.","PeriodicalId":106769,"journal":{"name":"Proceedings of the 10th Annual Cyber and Information Security Research Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 10th Annual Cyber and Information Security Research Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2746266.2746270","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
To counter data breaches, we introduce a new data leak prevention (DLP) approach. Unlike regular expression methods, our approach extracts a small number of critical semantic features and requires a small training set. Existing tools concentrate mostly on data format where most defense and industry applications would be better served by monitoring the semantics of information in the enterprise. We demonstrate our approach by comparing its performance with other state-of-the-art methods, such as latent dirichlet allocation (LDA) and support vector machine (SVM). The experiment results suggest that the proposed approach have superior accuracy in terms of detection rate and false-positive (FP) rate.