On the physical origin of linguistic laws and lognormality in speech.

IF 2.9 3区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Royal Society Open Science Pub Date : 2019-08-21 eCollection Date: 2019-08-01 DOI:10.1098/rsos.191023
Iván G Torre, Bartolo Luque, Lucas Lacasa, Christopher T Kello, Antoni Hernández-Fernández
{"title":"On the physical origin of linguistic laws and lognormality in speech.","authors":"Iván G Torre,&nbsp;Bartolo Luque,&nbsp;Lucas Lacasa,&nbsp;Christopher T Kello,&nbsp;Antoni Hernández-Fernández","doi":"10.1098/rsos.191023","DOIUrl":null,"url":null,"abstract":"<p><p>Physical manifestations of linguistic units include sources of variability due to factors of speech production which are by definition excluded from counts of linguistic symbols. In this work, we examine whether linguistic laws hold with respect to the physical manifestations of linguistic units in spoken English. The data we analyse come from a phonetically transcribed database of acoustic recordings of spontaneous speech known as the Buckeye Speech corpus. First, we verify with unprecedented accuracy that acoustically transcribed durations of linguistic units at several scales comply with a lognormal distribution, and we quantitatively justify this 'lognormality law' using a stochastic generative model. Second, we explore the four classical linguistic laws (Zipf's Law, Herdan's Law, Brevity Law and Menzerath-Altmann's Law (MAL)) in oral communication, both in physical units and in symbolic units measured in the speech transcriptions, and find that the validity of these laws is typically stronger when using physical units than in their symbolic counterpart. Additional results include (i) coining a Herdan's Law in physical units, (ii) a precise mathematical formulation of Brevity Law, which we show to be connected to optimal compression principles in information theory and allows to formulate and validate yet another law which we call the size-rank law or (iii) a mathematical derivation of MAL which also highlights an additional regime where the law is inverted. Altogether, these results support the hypothesis that statistical laws in language have a physical origin.</p>","PeriodicalId":21525,"journal":{"name":"Royal Society Open Science","volume":"6 8","pages":"191023"},"PeriodicalIF":2.9000,"publicationDate":"2019-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1098/rsos.191023","citationCount":"37","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Royal Society Open Science","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1098/rsos.191023","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2019/8/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 37

Abstract

Physical manifestations of linguistic units include sources of variability due to factors of speech production which are by definition excluded from counts of linguistic symbols. In this work, we examine whether linguistic laws hold with respect to the physical manifestations of linguistic units in spoken English. The data we analyse come from a phonetically transcribed database of acoustic recordings of spontaneous speech known as the Buckeye Speech corpus. First, we verify with unprecedented accuracy that acoustically transcribed durations of linguistic units at several scales comply with a lognormal distribution, and we quantitatively justify this 'lognormality law' using a stochastic generative model. Second, we explore the four classical linguistic laws (Zipf's Law, Herdan's Law, Brevity Law and Menzerath-Altmann's Law (MAL)) in oral communication, both in physical units and in symbolic units measured in the speech transcriptions, and find that the validity of these laws is typically stronger when using physical units than in their symbolic counterpart. Additional results include (i) coining a Herdan's Law in physical units, (ii) a precise mathematical formulation of Brevity Law, which we show to be connected to optimal compression principles in information theory and allows to formulate and validate yet another law which we call the size-rank law or (iii) a mathematical derivation of MAL which also highlights an additional regime where the law is inverted. Altogether, these results support the hypothesis that statistical laws in language have a physical origin.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
语言规律的物理起源与言语的逻辑规范。
语言单位的物理表现包括由于言语产生的因素而产生的变异源,这些因素根据定义被排除在语言符号的计数之外。在这项工作中,我们考察了语言规律是否适用于口语中语言单位的物理表现。我们分析的数据来自一个被称为七叶树语音语料库的自发语音录音的语音转录数据库。首先,我们以前所未有的准确性验证了语言单元在几个尺度上的声学转录持续时间符合对数正态分布,并使用随机生成模型定量证明了这一“对数正态定律”。其次,我们探讨了口语交际中的四个经典语言定律(齐普夫定律、赫丹定律、简洁定律和门泽拉特·阿尔特曼定律),无论是在物理单位还是在语音转录中测量的符号单位中,并发现当使用物理单位时,这些定律的有效性通常比使用符号单位时更强。额外的结果包括(i)以物理单位创造赫丹定律、(ii)简洁定律的精确数学公式、,我们证明它与信息论中的最优压缩原理有关,并允许制定和验证另一个定律,我们称之为大小秩定律或(iii)MAL的数学推导。总之,这些结果支持了一个假设,即语言中的统计规律有物理起源。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Royal Society Open Science
Royal Society Open Science Multidisciplinary-Multidisciplinary
CiteScore
6.00
自引率
0.00%
发文量
508
审稿时长
14 weeks
期刊介绍: Royal Society Open Science is a new open journal publishing high-quality original research across the entire range of science on the basis of objective peer-review. The journal covers the entire range of science and mathematics and will allow the Society to publish all the high-quality work it receives without the usual restrictions on scope, length or impact.
期刊最新文献
Text understanding in GPT-4 versus humans. Dynamic kinetic resolution-mediated synthesis of C-3 hydroxylated arginine derivatives. If you blink at me, I'll blink back. Domestic dogs' feedback to conspecific visual cues. Interaction of Remalan Brilliant Blue R dye with n-alkyltrimethylammonium chloride surfactants: conductometric and spectroscopic investigations. Two new doping-free manufacturing processes for bread-derived carbon electrodes with control over micro- and macro-topological surface features.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1