Prosodic vs. segmental contributions to naturalness in a diphone synthesizer

5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI:10.21437/ICSLP.1998-15

H. Bunnell, S. Hoskins, Debra Yarrington

引用次数: 7

Abstract

The relative contributions of segmental versus prosodic factors to the perceived naturalness of synthetic speech was measured by transplanting prosody between natural speech and the output of a diphone synthesizer. A small corpus was created containing matched sentence pairs wherein one member of the pair was a natural utterance and the other was a synthetic utterance generated with diphone data from the same talker. Two additional sentences were formed from each sentence pair by transplanting the prosodic structure between the natural and synthetic members of each pair. In two listening experiments subjects were asked to (a) classify each sentence as “natural” or “synthetic, or (b) rate the naturalness of each sentence. Results showed that the prosodic information was more important than segmental information in both classification and ratings of naturalness.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

音韵与音段对双管合成器自然性的贡献

通过在自然语音和双管合成器的输出之间移植韵律，测量了片段和韵律因素对合成语音感知自然度的相对贡献。创建了一个小语料库，其中包含匹配的句子对，其中一个成员是自然话语，另一个是由同一说话者的diphone数据生成的合成话语。通过将每对句子的自然句和合成句之间的韵律结构进行移植，形成两个附加句。在两个听力实验中，受试者被要求(a)将每个句子分类为“自然”或“合成”，或(b)对每个句子的自然程度进行评分。结果表明，韵律信息在自然度的分类和评分中都比片段信息更重要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊