利用一般 Tau 理论进行基于优化的语音发音规划

IF 2.4 3区 计算机科学 Q2 ACOUSTICS Speech Communication Pub Date : 2024-05-01 DOI:10.1016/j.specom.2024.103083
Benjamin Elie , Juraj Šimko , Alice Turk
{"title":"利用一般 Tau 理论进行基于优化的语音发音规划","authors":"Benjamin Elie ,&nbsp;Juraj Šimko ,&nbsp;Alice Turk","doi":"10.1016/j.specom.2024.103083","DOIUrl":null,"url":null,"abstract":"<div><p>This paper presents a model of speech articulation planning and generation based on General Tau Theory and Optimal Control Theory. Because General Tau Theory assumes that articulatory targets are always reached, the model accounts for speech variation via context-dependent articulatory targets. Targets are chosen via the optimization of a composite objective function. This function models three different task requirements: maximal intelligibility, minimal articulatory effort and minimal utterance duration. The paper shows that systematic phonetic variability can be reproduced by adjusting the weights assigned to each task requirement. Weights can be adjusted globally to simulate different speech styles, and can be adjusted locally to simulate different levels of prosodic prominence. The solution of the optimization procedure contains Tau equation parameter values for each articulatory movement, namely position of the articulator at the movement offset, movement duration, and a parameter which relates to the shape of the movement’s velocity profile. The paper presents simulations which illustrate the ability of the model to predict or reproduce several well-known characteristics of speech. These phenomena include close-to-symmetric velocity profiles for articulatory movement, variation related to speech rate, centralization of unstressed vowels, lengthening of stressed vowels, lenition of unstressed lingual stop consonants, and coarticulation of stop consonants.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"160 ","pages":"Article 103083"},"PeriodicalIF":2.4000,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167639324000554/pdfft?md5=9244f2762d9cdb76bf74cf04a57a092e&pid=1-s2.0-S0167639324000554-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Optimization-based planning of speech articulation using general Tau Theory\",\"authors\":\"Benjamin Elie ,&nbsp;Juraj Šimko ,&nbsp;Alice Turk\",\"doi\":\"10.1016/j.specom.2024.103083\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>This paper presents a model of speech articulation planning and generation based on General Tau Theory and Optimal Control Theory. Because General Tau Theory assumes that articulatory targets are always reached, the model accounts for speech variation via context-dependent articulatory targets. Targets are chosen via the optimization of a composite objective function. This function models three different task requirements: maximal intelligibility, minimal articulatory effort and minimal utterance duration. The paper shows that systematic phonetic variability can be reproduced by adjusting the weights assigned to each task requirement. Weights can be adjusted globally to simulate different speech styles, and can be adjusted locally to simulate different levels of prosodic prominence. The solution of the optimization procedure contains Tau equation parameter values for each articulatory movement, namely position of the articulator at the movement offset, movement duration, and a parameter which relates to the shape of the movement’s velocity profile. The paper presents simulations which illustrate the ability of the model to predict or reproduce several well-known characteristics of speech. These phenomena include close-to-symmetric velocity profiles for articulatory movement, variation related to speech rate, centralization of unstressed vowels, lengthening of stressed vowels, lenition of unstressed lingual stop consonants, and coarticulation of stop consonants.</p></div>\",\"PeriodicalId\":49485,\"journal\":{\"name\":\"Speech Communication\",\"volume\":\"160 \",\"pages\":\"Article 103083\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2024-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S0167639324000554/pdfft?md5=9244f2762d9cdb76bf74cf04a57a092e&pid=1-s2.0-S0167639324000554-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Speech Communication\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167639324000554\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ACOUSTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Speech Communication","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167639324000554","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0

摘要

本文介绍了一种基于通用 Tau 理论和最优控制理论的语音发音规划和生成模型。由于通用 Tau 理论假设发音目标总是可以达到,因此该模型通过与语境相关的发音目标来考虑语音的变化。目标是通过优化综合目标函数来选择的。该函数模拟了三种不同的任务要求:最大可懂度、最小发音努力和最短语篇持续时间。论文表明,通过调整分配给每个任务要求的权重,可以再现系统的语音变异性。权重可以全局调整,以模拟不同的语音风格,也可以局部调整,以模拟不同的前音突出程度。优化程序的解决方案包含每个发音动作的 Tau 方程参数值,即发音器在动作偏移时的位置、动作持续时间以及与动作速度曲线形状有关的参数。论文中的模拟结果表明,该模型能够预测或再现几种众所周知的语音特征。这些现象包括近乎对称的发音运动速度曲线、与语速有关的变化、非重读元音的集中、重读元音的延长、非重读舌尖停止辅音的变长以及停止辅音的共同发音。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Optimization-based planning of speech articulation using general Tau Theory

This paper presents a model of speech articulation planning and generation based on General Tau Theory and Optimal Control Theory. Because General Tau Theory assumes that articulatory targets are always reached, the model accounts for speech variation via context-dependent articulatory targets. Targets are chosen via the optimization of a composite objective function. This function models three different task requirements: maximal intelligibility, minimal articulatory effort and minimal utterance duration. The paper shows that systematic phonetic variability can be reproduced by adjusting the weights assigned to each task requirement. Weights can be adjusted globally to simulate different speech styles, and can be adjusted locally to simulate different levels of prosodic prominence. The solution of the optimization procedure contains Tau equation parameter values for each articulatory movement, namely position of the articulator at the movement offset, movement duration, and a parameter which relates to the shape of the movement’s velocity profile. The paper presents simulations which illustrate the ability of the model to predict or reproduce several well-known characteristics of speech. These phenomena include close-to-symmetric velocity profiles for articulatory movement, variation related to speech rate, centralization of unstressed vowels, lengthening of stressed vowels, lenition of unstressed lingual stop consonants, and coarticulation of stop consonants.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Speech Communication
Speech Communication 工程技术-计算机:跨学科应用
CiteScore
6.80
自引率
6.20%
发文量
94
审稿时长
19.2 weeks
期刊介绍: Speech Communication is an interdisciplinary journal whose primary objective is to fulfil the need for the rapid dissemination and thorough discussion of basic and applied research results. The journal''s primary objectives are: • to present a forum for the advancement of human and human-machine speech communication science; • to stimulate cross-fertilization between different fields of this domain; • to contribute towards the rapid and wide diffusion of scientifically sound contributions in this domain.
期刊最新文献
Fixed frequency range empirical wavelet transform based acoustic and entropy features for speech emotion recognition AFP-Conformer: Asymptotic feature pyramid conformer for spoofing speech detection A robust temporal map of speech monitoring from planning to articulation The combined effects of bilingualism and musicianship on listeners’ perception of non-native lexical tones Evaluating the effects of continuous pitch and speech tempo modifications on perceptual speaker verification performance by familiar and unfamiliar listeners
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1