An Efficient Framework for String Similarity Continuous Query on Data Stream

Jia Cui, Lei Shi, Juan Li, Zhaohui Liu
{"title":"An Efficient Framework for String Similarity Continuous Query on Data Stream","authors":"Jia Cui, Lei Shi, Juan Li, Zhaohui Liu","doi":"10.1109/ICSAI.2018.8599504","DOIUrl":null,"url":null,"abstract":"With rapid development of network technologies, the data accessing paradigm has been transferred from disk-oriented to “on-the-fly” data stream. The string similarity query on data stream has a broad prospect of application, especially in information security area and network monitoring. Due to the characteristics of stream and limitations of computing resources, the current methods based on static dataset cannot support stream efficiently. To solve these challenges, a framework named F2SCQ (framework of string similarity continuous query) based on filtering and verifying approach is pro-posed. It adopts basic window mechanism to update the sliding window, and the improved asymmetric signature (IAS) scheme to extract signature is proposed. Moreover two new filtering algorithms: Pre-Prune Filtering (PPF) and Count Filtering on Stream (CFS) are proposed. The experiments show that F2SCQ achieves high performance over high rates data stream. Compared to q-gram and asymmetric signature scheme, IAS achieves 50% and 20% faster extraction speed and 45% and 9% less storage overhead. The proposed filtering algorithm also achieves faster filtering speed and generates fewer candidates. F2SCQ minimizes the time and space complexity.","PeriodicalId":375852,"journal":{"name":"2018 5th International Conference on Systems and Informatics (ICSAI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 5th International Conference on Systems and Informatics (ICSAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSAI.2018.8599504","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

With rapid development of network technologies, the data accessing paradigm has been transferred from disk-oriented to “on-the-fly” data stream. The string similarity query on data stream has a broad prospect of application, especially in information security area and network monitoring. Due to the characteristics of stream and limitations of computing resources, the current methods based on static dataset cannot support stream efficiently. To solve these challenges, a framework named F2SCQ (framework of string similarity continuous query) based on filtering and verifying approach is pro-posed. It adopts basic window mechanism to update the sliding window, and the improved asymmetric signature (IAS) scheme to extract signature is proposed. Moreover two new filtering algorithms: Pre-Prune Filtering (PPF) and Count Filtering on Stream (CFS) are proposed. The experiments show that F2SCQ achieves high performance over high rates data stream. Compared to q-gram and asymmetric signature scheme, IAS achieves 50% and 20% faster extraction speed and 45% and 9% less storage overhead. The proposed filtering algorithm also achieves faster filtering speed and generates fewer candidates. F2SCQ minimizes the time and space complexity.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
一种高效的数据流字符串相似度连续查询框架
随着网络技术的飞速发展,数据访问模式已经从面向磁盘的数据流转变为“实时”数据流。数据流的字符串相似度查询具有广阔的应用前景,特别是在信息安全领域和网络监控领域。由于流的特性和计算资源的限制,目前基于静态数据集的方法不能有效地支持流。为了解决这些问题,提出了一种基于过滤和验证方法的字符串相似度连续查询框架F2SCQ。采用基本窗口机制更新滑动窗口,提出改进的非对称签名(IAS)方案提取签名。提出了两种新的滤波算法:预剪枝滤波(PPF)和流计数滤波(CFS)。实验结果表明,F2SCQ在高速率数据流下实现了高性能。与q-gram和非对称签名方案相比,IAS的提取速度提高了50%和20%,存储开销减少了45%和9%。该滤波算法还实现了更快的滤波速度和更少的候选对象。F2SCQ最大限度地减少了时间和空间复杂性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Research on Improvement of Text Processing and Clustering Algorithms in Public Opinion Early Warning System Mutation Relation Extraction and Genes Network Analysis in Colon Cancer Discovering Transportation Mode of Tourists Using Low-Sampling-Rate Trajectory of Cellular Data Sound Source Separation by Instantaneous Estimation-Based Spectral Subtraction Evaluation Of Electricity Market Operation Efficiency Based On Analytic Hierarchy Process-Grey Relational Analysis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1