Speech synthesis techniques. A survey

Youcef Tabet, M. Boughazi
{"title":"Speech synthesis techniques. A survey","authors":"Youcef Tabet, M. Boughazi","doi":"10.1109/WOSSPA.2011.5931414","DOIUrl":null,"url":null,"abstract":"The goal of this paper is to provide a short but a comprehensive overview of Text-To-Speech synthesis by highlighting its digital signal processing component. First two rule-based synthesis techniques (formant synthesis and articulatory synthesis) are explained then the concatenative synthesis is explored. Concatenative synthesis is simpler than rule-based synthesis, since there is no need to determine speech production rules. However, it introduces the challenges of prosodic modification to speech units and resolving discontinuities at unit boundaries. Prosodic modification results in artifacts in the speech that make the speech sound unnatural. Unit selection synthesis, which is a kind of concatenative synthesis, solves this problem by storing numerous instances for each unit with varying prosodies. The unit that best matches the target prosody is selected and concatenated. To resolve mismatches speech synthesis system combines the unit-selection method with Harmonic plus Noise Model (HNM). This model represents speech signal as a sum of a harmonic and noise part. The decomposition of speech signal into these two parts enables more natural sounding modifications of the signal. Finally Hidden Markov model(HMM) synthesis combined with an HNM model is introduced in order to obtain a Text-To-Speech system that requires smaller development time and cost.","PeriodicalId":343415,"journal":{"name":"International Workshop on Systems, Signal Processing and their Applications, WOSSPA","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"71","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Workshop on Systems, Signal Processing and their Applications, WOSSPA","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WOSSPA.2011.5931414","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 71

Abstract

The goal of this paper is to provide a short but a comprehensive overview of Text-To-Speech synthesis by highlighting its digital signal processing component. First two rule-based synthesis techniques (formant synthesis and articulatory synthesis) are explained then the concatenative synthesis is explored. Concatenative synthesis is simpler than rule-based synthesis, since there is no need to determine speech production rules. However, it introduces the challenges of prosodic modification to speech units and resolving discontinuities at unit boundaries. Prosodic modification results in artifacts in the speech that make the speech sound unnatural. Unit selection synthesis, which is a kind of concatenative synthesis, solves this problem by storing numerous instances for each unit with varying prosodies. The unit that best matches the target prosody is selected and concatenated. To resolve mismatches speech synthesis system combines the unit-selection method with Harmonic plus Noise Model (HNM). This model represents speech signal as a sum of a harmonic and noise part. The decomposition of speech signal into these two parts enables more natural sounding modifications of the signal. Finally Hidden Markov model(HMM) synthesis combined with an HNM model is introduced in order to obtain a Text-To-Speech system that requires smaller development time and cost.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
语音合成技术。一项调查显示
本文的目标是通过强调其数字信号处理组件,提供文本到语音合成的简短但全面的概述。首先解释了两种基于规则的合成技术(形成峰合成和发音合成),然后探讨了串联合成。串联合成比基于规则的合成更简单,因为不需要确定语音生成规则。然而,它引入了语音单元的韵律修饰和解决单元边界不连续性的挑战。韵律的改变会导致语音中的伪影,使语音听起来不自然。单元选择合成(Unit selection synthesis)是一种串联合成(concatative synthesis),它通过为每个韵律不同的单元存储大量实例来解决这个问题。选择与目标韵律最匹配的单元并将其连接起来。为了解决语音不匹配问题,将单元选择方法与谐波加噪声模型(HNM)相结合。该模型将语音信号表示为谐波部分和噪声部分的和。将语音信号分解为这两部分,可以对信号进行更自然的声音修饰。最后将隐马尔可夫模型(HMM)综合与HNM模型相结合,以获得开发时间和成本更小的文本到语音系统。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Performance limitations of an optical RZ-DPSK transmission system affected by frequency chirp, chromatic dispersion and polarization mode dispersion MPEG-4 AVC re-encoding for watermarking purposes Some issues on cognitive radio and UWB technology convergence for enabling green networks Adaptive blind equalization for QAM modulated signals in the presence of frequency offset Elliptic Curve Cryptography and its applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1