Optimal Coding and the Origins of Zipfian Laws

IF 0.7 2区 文学 0 LANGUAGE & LINGUISTICS Journal of Quantitative Linguistics Pub Date : 2019-06-04 DOI:10.1080/09296174.2020.1778387
R. Ferrer-i-Cancho, C. Bentz
{"title":"Optimal Coding and the Origins of Zipfian Laws","authors":"R. Ferrer-i-Cancho, C. Bentz","doi":"10.1080/09296174.2020.1778387","DOIUrl":null,"url":null,"abstract":"ABSTRACT The problem of compression in standard information theory consists of assigning codes as short as possible to numbers. Here we consider the problem of optimal coding – under an arbitrary coding scheme – and show that it predicts Zipf’s law of abbreviation, namely a tendency in natural languages for more frequent words to be shorter. We apply this result to investigate optimal coding also under so-called non-singular coding, a scheme where unique segmentation is not warranted but codes stand for a distinct number. Optimal non-singular coding predicts that the length of a word should grow approximately as the logarithm of its frequency rank, which is again consistent with Zipf’s law of abbreviation. Optimal non-singular coding in combination with the maximum entropy principle also predicts Zipf’s rank-frequency distribution. Furthermore, our findings on optimal non-singular coding challenge common beliefs about random typing. It turns out that random typing is in fact an optimal coding process, in stark contrast with the common assumption that it is detached from cost cutting considerations. Finally, we discuss the implications of optimal coding for the construction of a compact theory of Zipfian laws more generally as well as other linguistic laws.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":null,"pages":null},"PeriodicalIF":0.7000,"publicationDate":"2019-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2020.1778387","citationCount":"35","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Quantitative Linguistics","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1080/09296174.2020.1778387","RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 35

Abstract

ABSTRACT The problem of compression in standard information theory consists of assigning codes as short as possible to numbers. Here we consider the problem of optimal coding – under an arbitrary coding scheme – and show that it predicts Zipf’s law of abbreviation, namely a tendency in natural languages for more frequent words to be shorter. We apply this result to investigate optimal coding also under so-called non-singular coding, a scheme where unique segmentation is not warranted but codes stand for a distinct number. Optimal non-singular coding predicts that the length of a word should grow approximately as the logarithm of its frequency rank, which is again consistent with Zipf’s law of abbreviation. Optimal non-singular coding in combination with the maximum entropy principle also predicts Zipf’s rank-frequency distribution. Furthermore, our findings on optimal non-singular coding challenge common beliefs about random typing. It turns out that random typing is in fact an optimal coding process, in stark contrast with the common assumption that it is detached from cost cutting considerations. Finally, we discuss the implications of optimal coding for the construction of a compact theory of Zipfian laws more generally as well as other linguistic laws.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
最优编码与Zipfian定律的起源
标准信息论中的压缩问题包括将尽可能短的代码分配给数字。在这里,我们考虑了在任意编码方案下的最优编码问题,并表明它预测了Zipf缩写定律,即自然语言中更频繁的单词更短的趋势。我们将这一结果应用于研究在所谓的非奇异编码下的最优编码,该方案不保证唯一分割,但代码代表不同的数字。最优非奇异编码预测单词的长度应该近似于其频率秩的对数增长,这再次符合Zipf的缩写定律。结合最大熵原理的最优非奇异编码也预测了齐普夫的秩频率分布。此外,我们关于最优非奇异编码的发现挑战了关于随机类型的普遍信念。事实证明,随机打字实际上是一种最佳的编码过程,这与人们普遍认为它脱离了成本削减的考虑形成了鲜明对比。最后,我们讨论了最优编码对Zipfian定律以及其他语言定律的紧凑理论的构建的影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
2.90
自引率
7.10%
发文量
7
期刊介绍: The Journal of Quantitative Linguistics is an international forum for the publication and discussion of research on the quantitative characteristics of language and text in an exact mathematical form. This approach, which is of growing interest, opens up important and exciting theoretical perspectives, as well as solutions for a wide range of practical problems such as machine learning or statistical parsing, by introducing into linguistics the methods and models of advanced scientific disciplines such as the natural sciences, economics, and psychology.
期刊最新文献
Exploring Colligation Diversity and Grammaticalization in Chinese: An Entropy-Based Approach The Menzerath-Altmann Law at the Paragraph Level in Written Chinese: Why Register and Text Size Matter? An Information-Theoretic Approach to Morphosyntactic Complexity in English, Dutch and German Investigating the Hierarchical Relationship Between Clause and Phrase Using the Menzerath-Altmann Law: Evidence from Academic Research Articles Quantifying Syntactic Complexity in Czech Texts: An Analysis of Mean Dependency Distance and Average Sentence Length Across Genres
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1