Log Layering Based on Natural Language Processing

Hanji Shen, Chun Long, Wei Wan, Jun Li, Yakui Qin, Yuhao Fu, Xiaofan Song
{"title":"Log Layering Based on Natural Language Processing","authors":"Hanji Shen, Chun Long, Wei Wan, Jun Li, Yakui Qin, Yuhao Fu, Xiaofan Song","doi":"10.23919/ICACT.2019.8702019","DOIUrl":null,"url":null,"abstract":"With the increasing number and variety of logs, the requirement of storage space is growing rapidly. Meantime, the speed and accuracy of querying in massive logs are becoming increasingly important. Although the well-built distributed storage technique solves the problem of mass storage and fast query, the cost is too high. As logs are created as the method to trace the historical operation, the requirement for query rate is not high. To balance the storage cost and query rate, this paper proposes a real-time log layering storage technique based on natural language processing. According to the characteristics of the log data, this technique is combined with the text language processing technique. It compresses the real-time log data effectively while considering the query efficiency. Firstly, the method extracts the feature of each log that flows in, which will be the type name of the log. Then, the method performs word segmentation on the log and encodes each word to store the key value pairs. Finally, the key value pairs of the log are stored in the memory, and the code of each log is stored in the database. Experiments show that this method can ensure the integrity of the data effectively, decompression time dropped to 40%, compression rate down to 35%.","PeriodicalId":226261,"journal":{"name":"2019 21st International Conference on Advanced Communication Technology (ICACT)","volume":"100 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 21st International Conference on Advanced Communication Technology (ICACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/ICACT.2019.8702019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

With the increasing number and variety of logs, the requirement of storage space is growing rapidly. Meantime, the speed and accuracy of querying in massive logs are becoming increasingly important. Although the well-built distributed storage technique solves the problem of mass storage and fast query, the cost is too high. As logs are created as the method to trace the historical operation, the requirement for query rate is not high. To balance the storage cost and query rate, this paper proposes a real-time log layering storage technique based on natural language processing. According to the characteristics of the log data, this technique is combined with the text language processing technique. It compresses the real-time log data effectively while considering the query efficiency. Firstly, the method extracts the feature of each log that flows in, which will be the type name of the log. Then, the method performs word segmentation on the log and encodes each word to store the key value pairs. Finally, the key value pairs of the log are stored in the memory, and the code of each log is stored in the database. Experiments show that this method can ensure the integrity of the data effectively, decompression time dropped to 40%, compression rate down to 35%.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于自然语言处理的日志分层
随着日志数量和种类的不断增加,对存储空间的需求也在迅速增长。同时,海量日志查询的速度和准确性也变得越来越重要。虽然构建良好的分布式存储技术解决了海量存储和快速查询的问题,但成本过高。由于创建日志是为了跟踪历史操作,所以对查询率的要求不高。为了平衡存储成本和查询率,本文提出了一种基于自然语言处理的实时日志分层存储技术。根据测井数据的特点,将该技术与文本语言处理技术相结合。在考虑查询效率的同时,有效地压缩了实时日志数据。首先,该方法提取流入的每个日志的特征,这将是日志的类型名称。然后,该方法对日志执行分词,并对每个单词进行编码以存储键值对。最后,日志的键值对存储在内存中,每个日志的代码存储在数据库中。实验表明,该方法能有效保证数据的完整性,解压时间降至40%,压缩率降至35%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Novel Ranging Code based on improved Logistic Map Chaotic Sequences A Learning Kit on IPv6 Deployment and its Security Challenges for Neophytes Cybercrime Countermeasure of Insider Threat Investigation A Novel Ultra-Wideband Antenna Operating in the frequency band of 2.5-40GHz Modelling Chlorophyll-a Concentration using Deep Neural Networks considering Extreme Data Imbalance and Skewness
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1