语音合成技术。一项调查显示

International Workshop on Systems, Signal Processing and their Applications, WOSSPA Pub Date : 2011-05-09 DOI:10.1109/WOSSPA.2011.5931414

Youcef Tabet, M. Boughazi

{"title":"语音合成技术。一项调查显示","authors":"Youcef Tabet, M. Boughazi","doi":"10.1109/WOSSPA.2011.5931414","DOIUrl":null,"url":null,"abstract":"The goal of this paper is to provide a short but a comprehensive overview of Text-To-Speech synthesis by highlighting its digital signal processing component. First two rule-based synthesis techniques (formant synthesis and articulatory synthesis) are explained then the concatenative synthesis is explored. Concatenative synthesis is simpler than rule-based synthesis, since there is no need to determine speech production rules. However, it introduces the challenges of prosodic modification to speech units and resolving discontinuities at unit boundaries. Prosodic modification results in artifacts in the speech that make the speech sound unnatural. Unit selection synthesis, which is a kind of concatenative synthesis, solves this problem by storing numerous instances for each unit with varying prosodies. The unit that best matches the target prosody is selected and concatenated. To resolve mismatches speech synthesis system combines the unit-selection method with Harmonic plus Noise Model (HNM). This model represents speech signal as a sum of a harmonic and noise part. The decomposition of speech signal into these two parts enables more natural sounding modifications of the signal. Finally Hidden Markov model(HMM) synthesis combined with an HNM model is introduced in order to obtain a Text-To-Speech system that requires smaller development time and cost.","PeriodicalId":343415,"journal":{"name":"International Workshop on Systems, Signal Processing and their Applications, WOSSPA","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"71","resultStr":"{\"title\":\"Speech synthesis techniques. A survey\",\"authors\":\"Youcef Tabet, M. Boughazi\",\"doi\":\"10.1109/WOSSPA.2011.5931414\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The goal of this paper is to provide a short but a comprehensive overview of Text-To-Speech synthesis by highlighting its digital signal processing component. First two rule-based synthesis techniques (formant synthesis and articulatory synthesis) are explained then the concatenative synthesis is explored. Concatenative synthesis is simpler than rule-based synthesis, since there is no need to determine speech production rules. However, it introduces the challenges of prosodic modification to speech units and resolving discontinuities at unit boundaries. Prosodic modification results in artifacts in the speech that make the speech sound unnatural. Unit selection synthesis, which is a kind of concatenative synthesis, solves this problem by storing numerous instances for each unit with varying prosodies. The unit that best matches the target prosody is selected and concatenated. To resolve mismatches speech synthesis system combines the unit-selection method with Harmonic plus Noise Model (HNM). This model represents speech signal as a sum of a harmonic and noise part. The decomposition of speech signal into these two parts enables more natural sounding modifications of the signal. Finally Hidden Markov model(HMM) synthesis combined with an HNM model is introduced in order to obtain a Text-To-Speech system that requires smaller development time and cost.\",\"PeriodicalId\":343415,\"journal\":{\"name\":\"International Workshop on Systems, Signal Processing and their Applications, WOSSPA\",\"volume\":\"70 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-05-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"71\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Workshop on Systems, Signal Processing and their Applications, WOSSPA\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WOSSPA.2011.5931414\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Workshop on Systems, Signal Processing and their Applications, WOSSPA","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WOSSPA.2011.5931414","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 71

摘要

本文的目标是通过强调其数字信号处理组件，提供文本到语音合成的简短但全面的概述。首先解释了两种基于规则的合成技术(形成峰合成和发音合成)，然后探讨了串联合成。串联合成比基于规则的合成更简单，因为不需要确定语音生成规则。然而，它引入了语音单元的韵律修饰和解决单元边界不连续性的挑战。韵律的改变会导致语音中的伪影，使语音听起来不自然。单元选择合成(Unit selection synthesis)是一种串联合成(concatative synthesis)，它通过为每个韵律不同的单元存储大量实例来解决这个问题。选择与目标韵律最匹配的单元并将其连接起来。为了解决语音不匹配问题，将单元选择方法与谐波加噪声模型(HNM)相结合。该模型将语音信号表示为谐波部分和噪声部分的和。将语音信号分解为这两部分，可以对信号进行更自然的声音修饰。最后将隐马尔可夫模型(HMM)综合与HNM模型相结合，以获得开发时间和成本更小的文本到语音系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Speech synthesis techniques. A survey

The goal of this paper is to provide a short but a comprehensive overview of Text-To-Speech synthesis by highlighting its digital signal processing component. First two rule-based synthesis techniques (formant synthesis and articulatory synthesis) are explained then the concatenative synthesis is explored. Concatenative synthesis is simpler than rule-based synthesis, since there is no need to determine speech production rules. However, it introduces the challenges of prosodic modification to speech units and resolving discontinuities at unit boundaries. Prosodic modification results in artifacts in the speech that make the speech sound unnatural. Unit selection synthesis, which is a kind of concatenative synthesis, solves this problem by storing numerous instances for each unit with varying prosodies. The unit that best matches the target prosody is selected and concatenated. To resolve mismatches speech synthesis system combines the unit-selection method with Harmonic plus Noise Model (HNM). This model represents speech signal as a sum of a harmonic and noise part. The decomposition of speech signal into these two parts enables more natural sounding modifications of the signal. Finally Hidden Markov model(HMM) synthesis combined with an HNM model is introduced in order to obtain a Text-To-Speech system that requires smaller development time and cost.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Workshop on Systems, Signal Processing and their Applications, WOSSPA

自引率

0.00%

发文量