使用长公共字符串进行数据压缩

Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096) Pub Date : 1999-03-29 DOI:10.1109/DCC.1999.755678

J. Bentley

{"title":"使用长公共字符串进行数据压缩","authors":"J. Bentley","doi":"10.1109/DCC.1999.755678","DOIUrl":null,"url":null,"abstract":"We describe a precompression algorithm that effectively represents any long common strings that appear in a file. The algorithm interacts well with standard compression algorithms that represent shorter strings that are near in the input text. Our experiments show that some real data sets do indeed contain many long common strings. We extend the fingerprint mechanisms of our algorithm to a program that identifies long common strings in an input file. This program gives interesting insights into the structure of real data files that contain long common strings.","PeriodicalId":103598,"journal":{"name":"Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096)","volume":"356 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1999-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"72","resultStr":"{\"title\":\"Data compression using long common strings\",\"authors\":\"J. Bentley\",\"doi\":\"10.1109/DCC.1999.755678\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We describe a precompression algorithm that effectively represents any long common strings that appear in a file. The algorithm interacts well with standard compression algorithms that represent shorter strings that are near in the input text. Our experiments show that some real data sets do indeed contain many long common strings. We extend the fingerprint mechanisms of our algorithm to a program that identifies long common strings in an input file. This program gives interesting insights into the structure of real data files that contain long common strings.\",\"PeriodicalId\":103598,\"journal\":{\"name\":\"Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096)\",\"volume\":\"356 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1999-03-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"72\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DCC.1999.755678\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.1999.755678","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 72

摘要

我们描述了一种预压缩算法，它可以有效地表示文件中出现的任何长公共字符串。该算法与表示接近输入文本的较短字符串的标准压缩算法很好地交互。我们的实验表明，一些真实的数据集确实包含许多长公共字符串。我们将算法的指纹机制扩展到一个程序，该程序可以识别输入文件中的长公共字符串。这个程序对包含长公共字符串的实际数据文件的结构提供了有趣的见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Data compression using long common strings

We describe a precompression algorithm that effectively represents any long common strings that appear in a file. The algorithm interacts well with standard compression algorithms that represent shorter strings that are near in the input text. Our experiments show that some real data sets do indeed contain many long common strings. We extend the fingerprint mechanisms of our algorithm to a program that identifies long common strings in an input file. This program gives interesting insights into the structure of real data files that contain long common strings.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096)

自引率

0.00%

发文量

期刊最新文献

Real-time VBR rate control of MPEG video based upon lexicographic bit allocation Performance of quantizers on noisy channels using structured families of codes SICLIC: a simple inter-color lossless image coder Protein is incompressible Encoding time reduction in fractal image compression