动态的基于单词的文本压缩

K. Ng, L. Cheng, C. H. Wong
{"title":"动态的基于单词的文本压缩","authors":"K. Ng, L. Cheng, C. H. Wong","doi":"10.1109/ICDAR.1997.619880","DOIUrl":null,"url":null,"abstract":"We propose a dynamic text compression technique with a back searching algorithm and a new storage protocol. Codes being encoded are divided into three types namely copy, literal and hybrid codes. Multiple dictionaries are adopted and each of them has a linked sub-dictionary. Each dictionary has a portion of pre-defined words i.e. the most frequent words and the rest of the entries will depend on the message. A hashing function developed by Pearson (1990) is adopted. It serves two purposes. Firstly, it is used to initialize the dictionary. Secondly, it is used as a quick search to a particular word. By using this scheme, the spaces between words do not need to be considered. At the decoding side, a space character will be appended after each word is decoded. Therefore, the redundancy of space can also be compressed. The result shows that the original message will not be expanded even if we have poor dictionary design.","PeriodicalId":435320,"journal":{"name":"Proceedings of the Fourth International Conference on Document Analysis and Recognition","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1997-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Dynamic word based text compression\",\"authors\":\"K. Ng, L. Cheng, C. H. Wong\",\"doi\":\"10.1109/ICDAR.1997.619880\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose a dynamic text compression technique with a back searching algorithm and a new storage protocol. Codes being encoded are divided into three types namely copy, literal and hybrid codes. Multiple dictionaries are adopted and each of them has a linked sub-dictionary. Each dictionary has a portion of pre-defined words i.e. the most frequent words and the rest of the entries will depend on the message. A hashing function developed by Pearson (1990) is adopted. It serves two purposes. Firstly, it is used to initialize the dictionary. Secondly, it is used as a quick search to a particular word. By using this scheme, the spaces between words do not need to be considered. At the decoding side, a space character will be appended after each word is decoded. Therefore, the redundancy of space can also be compressed. The result shows that the original message will not be expanded even if we have poor dictionary design.\",\"PeriodicalId\":435320,\"journal\":{\"name\":\"Proceedings of the Fourth International Conference on Document Analysis and Recognition\",\"volume\":\"42 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1997-08-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Fourth International Conference on Document Analysis and Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDAR.1997.619880\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Fourth International Conference on Document Analysis and Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.1997.619880","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

摘要

提出了一种基于反向搜索算法和新的存储协议的动态文本压缩技术。编码的代码分为三种类型,即复制代码、文字代码和混合代码。采用多个字典,每个字典都有一个链接的子字典。每个字典都有一部分预定义的单词,即最常见的单词,其余的条目将取决于消息。采用Pearson(1990)开发的哈希函数。它有两个目的。首先,它用于初始化字典。其次,它被用作对特定单词的快速搜索。通过使用这种方案,不需要考虑单词之间的空格。在解码端,每个字被解码后会附加一个空格字符。因此,空间的冗余也可以被压缩。结果表明,即使我们的字典设计很差,原始信息也不会被扩展。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Dynamic word based text compression
We propose a dynamic text compression technique with a back searching algorithm and a new storage protocol. Codes being encoded are divided into three types namely copy, literal and hybrid codes. Multiple dictionaries are adopted and each of them has a linked sub-dictionary. Each dictionary has a portion of pre-defined words i.e. the most frequent words and the rest of the entries will depend on the message. A hashing function developed by Pearson (1990) is adopted. It serves two purposes. Firstly, it is used to initialize the dictionary. Secondly, it is used as a quick search to a particular word. By using this scheme, the spaces between words do not need to be considered. At the decoding side, a space character will be appended after each word is decoded. Therefore, the redundancy of space can also be compressed. The result shows that the original message will not be expanded even if we have poor dictionary design.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Document layout analysis based on emergent computation Offline handwritten Chinese character recognition via radical extraction and recognition Boundary normalization for recognition of non-touching non-degraded characters Words recognition using associative memory Image and text coupling for creating electronic books from manuscripts
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1