{"title":"An Efficient Framework for String Similarity Continuous Query on Data Stream","authors":"Jia Cui, Lei Shi, Juan Li, Zhaohui Liu","doi":"10.1109/ICSAI.2018.8599504","DOIUrl":null,"url":null,"abstract":"With rapid development of network technologies, the data accessing paradigm has been transferred from disk-oriented to “on-the-fly” data stream. The string similarity query on data stream has a broad prospect of application, especially in information security area and network monitoring. Due to the characteristics of stream and limitations of computing resources, the current methods based on static dataset cannot support stream efficiently. To solve these challenges, a framework named F2SCQ (framework of string similarity continuous query) based on filtering and verifying approach is pro-posed. It adopts basic window mechanism to update the sliding window, and the improved asymmetric signature (IAS) scheme to extract signature is proposed. Moreover two new filtering algorithms: Pre-Prune Filtering (PPF) and Count Filtering on Stream (CFS) are proposed. The experiments show that F2SCQ achieves high performance over high rates data stream. Compared to q-gram and asymmetric signature scheme, IAS achieves 50% and 20% faster extraction speed and 45% and 9% less storage overhead. The proposed filtering algorithm also achieves faster filtering speed and generates fewer candidates. F2SCQ minimizes the time and space complexity.","PeriodicalId":375852,"journal":{"name":"2018 5th International Conference on Systems and Informatics (ICSAI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 5th International Conference on Systems and Informatics (ICSAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSAI.2018.8599504","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
With rapid development of network technologies, the data accessing paradigm has been transferred from disk-oriented to “on-the-fly” data stream. The string similarity query on data stream has a broad prospect of application, especially in information security area and network monitoring. Due to the characteristics of stream and limitations of computing resources, the current methods based on static dataset cannot support stream efficiently. To solve these challenges, a framework named F2SCQ (framework of string similarity continuous query) based on filtering and verifying approach is pro-posed. It adopts basic window mechanism to update the sliding window, and the improved asymmetric signature (IAS) scheme to extract signature is proposed. Moreover two new filtering algorithms: Pre-Prune Filtering (PPF) and Count Filtering on Stream (CFS) are proposed. The experiments show that F2SCQ achieves high performance over high rates data stream. Compared to q-gram and asymmetric signature scheme, IAS achieves 50% and 20% faster extraction speed and 45% and 9% less storage overhead. The proposed filtering algorithm also achieves faster filtering speed and generates fewer candidates. F2SCQ minimizes the time and space complexity.