迈向压缩测试的校准语料库

M. Titchener, P. Fenwick, M. C. Chen
{"title":"迈向压缩测试的校准语料库","authors":"M. Titchener, P. Fenwick, M. C. Chen","doi":"10.1109/DCC.1999.785711","DOIUrl":null,"url":null,"abstract":"Summary form only given. A mini-corpus of twelve 'calibrated' binary-data files have been produced for systematic evaluation of compression algorithms. These are generated within the framework of a deterministic theory of string complexity. Here the T-complexity of a string x (measured in taugs) is defined as C/sub T/(x/sub i/)=/spl Sigma//sub i/log/sub 2/(k/sub i/+1), where the positive integers k/sub i/ are the T-expansion parameters for the corresponding string production process. C/sub T/(x) is observed to be the logarithmic integral of the total information content I/sub x/ of x (measured in nats), i.e., C/sub T/(x)=li(I/sub x/). The average entropy is H~/sub x/=I/sub x//|x|, i.e., the total information content divided by the length of x. Thus C/sub T/(x)=li(H~/sub x//spl times/|x|). Alternatively, the information rate along a string may be described by an entropy function H/sub x/(n),0/spl les/n/spl les/|x| for the string. Assuming that H/sub x/(n) is continuously integrable along the length of the x, then I/sub x/=/spl int//sub 0//sup |/x|H/sub x/(n)/spl delta/n. Thus C/sub T/(x)=li(/spl int//sub 0//sup |/x|H/sub x/(n)/spl delta/n). Solving for H/sub x/(n): that is differentiating both sides and rearranging, we get: H/sub x/(n)=(/spl delta/C/sub T/(x|n)//spl delta/n)/spl times/log/sub e/(li/sup -1/(C/sub T/(x|/sub n/))). With x being in fact discrete, and the T-complexity function being computed in terms of the discrete T-augmentation steps, we may accordingly re-express the equation in terms of the T-prefix increments: /spl delta/n/spl ap//spl Delta//sub i/|x|=k/sub i/|p/sub i/|; and from the definition of C/sub T/(x): /spl delta/C/sub T/(x) is replaced by /spl Delta//sub i/C/sub T/(x)=log/sub 2/(k/sub i/+1). The average slope over the i-th T-prefix p/sub i/ increment is then simply (/spl Delta//sub i/C/sub T/(x))/(/spl Delta//sub i/|x|)=(log/sub 2/(k/sub i/+1))/(k/sub i/|p/sub i/|). The entropy function is now replaced by a discrete approximation.","PeriodicalId":103598,"journal":{"name":"Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1999-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Towards a calibrated corpus for compression testing\",\"authors\":\"M. Titchener, P. Fenwick, M. C. Chen\",\"doi\":\"10.1109/DCC.1999.785711\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Summary form only given. A mini-corpus of twelve 'calibrated' binary-data files have been produced for systematic evaluation of compression algorithms. These are generated within the framework of a deterministic theory of string complexity. Here the T-complexity of a string x (measured in taugs) is defined as C/sub T/(x/sub i/)=/spl Sigma//sub i/log/sub 2/(k/sub i/+1), where the positive integers k/sub i/ are the T-expansion parameters for the corresponding string production process. C/sub T/(x) is observed to be the logarithmic integral of the total information content I/sub x/ of x (measured in nats), i.e., C/sub T/(x)=li(I/sub x/). The average entropy is H~/sub x/=I/sub x//|x|, i.e., the total information content divided by the length of x. Thus C/sub T/(x)=li(H~/sub x//spl times/|x|). Alternatively, the information rate along a string may be described by an entropy function H/sub x/(n),0/spl les/n/spl les/|x| for the string. Assuming that H/sub x/(n) is continuously integrable along the length of the x, then I/sub x/=/spl int//sub 0//sup |/x|H/sub x/(n)/spl delta/n. Thus C/sub T/(x)=li(/spl int//sub 0//sup |/x|H/sub x/(n)/spl delta/n). Solving for H/sub x/(n): that is differentiating both sides and rearranging, we get: H/sub x/(n)=(/spl delta/C/sub T/(x|n)//spl delta/n)/spl times/log/sub e/(li/sup -1/(C/sub T/(x|/sub n/))). With x being in fact discrete, and the T-complexity function being computed in terms of the discrete T-augmentation steps, we may accordingly re-express the equation in terms of the T-prefix increments: /spl delta/n/spl ap//spl Delta//sub i/|x|=k/sub i/|p/sub i/|; and from the definition of C/sub T/(x): /spl delta/C/sub T/(x) is replaced by /spl Delta//sub i/C/sub T/(x)=log/sub 2/(k/sub i/+1). The average slope over the i-th T-prefix p/sub i/ increment is then simply (/spl Delta//sub i/C/sub T/(x))/(/spl Delta//sub i/|x|)=(log/sub 2/(k/sub i/+1))/(k/sub i/|p/sub i/|). The entropy function is now replaced by a discrete approximation.\",\"PeriodicalId\":103598,\"journal\":{\"name\":\"Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096)\",\"volume\":\"48 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1999-03-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DCC.1999.785711\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.1999.785711","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

只提供摘要形式。12个“校准”二进制数据文件的迷你语料库已经产生了压缩算法的系统评估。这些都是在弦复杂性的确定性理论框架内生成的。这里,字符串x的T-复杂度(以标签为单位)定义为C/下标T/(x/下标i/)=/spl Sigma//下标i/log/下标2/(k/下标i/+1),其中正整数k/下标i/是对应的字符串生产过程的T-展开参数。观察到C/ T/(x)是x(以纳特为单位)的总信息量I/下标x/的对数积分,即C/ T/(x)=li(I/下标x/)。平均熵为H~/sub x/=I/sub x//|x|,即总信息量除以x的长度,因此C/sub T/(x)=li(H~/sub x//spl乘以/|x|)。或者,沿着字符串的信息速率可以用熵函数H/sub x/(n)来描述,对于字符串,0/spl les/n/spl les/|x|。假设H/下标x/(n)沿x的长度连续可积,则I/下标x/=/spl int//下标0//sup |/x|H/下标x/(n)/spl /n。因此C / sub T /李(x) = (spl int / / sub x | 0 | / /晚餐/ H / sub x / (n) / splδ/ n)。求解H/下标x/(n)也就是两边求导并重新排列,我们得到H/下标x/(n)=(/spl /C/ T/(x|n)//spl /(n) /spl乘以/log/ e/(li/sup -1/(C/下标T/(x|/下标n/)))由于x实际上是离散的,并且t -复杂度函数是用离散的t增积步骤来计算的,因此我们可以用t前缀增量来重新表示方程:/spl delta/n/spl ap//spl delta/ /下标i/|x|=k/下标i/|p/下标i/|;由C/ T/(x)的定义:/spl /C/ T/(x)被/spl //下标i/C/下标T/(x)=log/下标2/(k/下标i/+1)所取代。第i个T前缀p/下标i/增量的平均斜率为(/spl Delta//下标i/C/下标T/(x))/(/spl Delta//下标i/|x|)=(log/下标2/(k/下标i/+1) /(k/下标i/|p/下标i/|))。熵函数现在被一个离散的近似代替了。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Towards a calibrated corpus for compression testing
Summary form only given. A mini-corpus of twelve 'calibrated' binary-data files have been produced for systematic evaluation of compression algorithms. These are generated within the framework of a deterministic theory of string complexity. Here the T-complexity of a string x (measured in taugs) is defined as C/sub T/(x/sub i/)=/spl Sigma//sub i/log/sub 2/(k/sub i/+1), where the positive integers k/sub i/ are the T-expansion parameters for the corresponding string production process. C/sub T/(x) is observed to be the logarithmic integral of the total information content I/sub x/ of x (measured in nats), i.e., C/sub T/(x)=li(I/sub x/). The average entropy is H~/sub x/=I/sub x//|x|, i.e., the total information content divided by the length of x. Thus C/sub T/(x)=li(H~/sub x//spl times/|x|). Alternatively, the information rate along a string may be described by an entropy function H/sub x/(n),0/spl les/n/spl les/|x| for the string. Assuming that H/sub x/(n) is continuously integrable along the length of the x, then I/sub x/=/spl int//sub 0//sup |/x|H/sub x/(n)/spl delta/n. Thus C/sub T/(x)=li(/spl int//sub 0//sup |/x|H/sub x/(n)/spl delta/n). Solving for H/sub x/(n): that is differentiating both sides and rearranging, we get: H/sub x/(n)=(/spl delta/C/sub T/(x|n)//spl delta/n)/spl times/log/sub e/(li/sup -1/(C/sub T/(x|/sub n/))). With x being in fact discrete, and the T-complexity function being computed in terms of the discrete T-augmentation steps, we may accordingly re-express the equation in terms of the T-prefix increments: /spl delta/n/spl ap//spl Delta//sub i/|x|=k/sub i/|p/sub i/|; and from the definition of C/sub T/(x): /spl delta/C/sub T/(x) is replaced by /spl Delta//sub i/C/sub T/(x)=log/sub 2/(k/sub i/+1). The average slope over the i-th T-prefix p/sub i/ increment is then simply (/spl Delta//sub i/C/sub T/(x))/(/spl Delta//sub i/|x|)=(log/sub 2/(k/sub i/+1))/(k/sub i/|p/sub i/|). The entropy function is now replaced by a discrete approximation.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Real-time VBR rate control of MPEG video based upon lexicographic bit allocation Performance of quantizers on noisy channels using structured families of codes SICLIC: a simple inter-color lossless image coder Protein is incompressible Encoding time reduction in fractal image compression
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1