NSV-TTS: Non-Speech Vocalization Modeling And Transfer In Emotional Text-To-Speech

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2023-06-04 DOI:10.1109/ICASSP49357.2023.10096033

Haitong Zhang, Xinyuan Yu, Yue Lin

引用次数: 1

Abstract

This paper addresses the problem of non-speech vocalization (NSV) modeling and transfer in emotional TTS. We propose an emotion TTS system (NSV-TTS) to model NSV and emotional speech. The model utilizes self-supervised learning to extract unsupervised linguistic units (ULUs) for NSV labeling and zero-shot NSV transfer. Furthermore, we propose token mixing and random masking to boost the performance. We evaluate the proposed method on various NSV types and emotion classes. The experimental results reveal that the proposed method performs well in the zero-shot NSV transfer task. Lastly, we conduct ablation studies to investigate the proposed method further.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

情绪性文本到语音的非言语发声建模与迁移

本文研究了情感交际中非言语发声(NSV)的建模和迁移问题。我们提出了一个情绪型非成音节元音和情绪型言语模型(NSV-TTS)。该模型利用自监督学习提取无监督语言单元，用于非成音节元音标注和零次非成音节元音迁移。此外，我们提出了令牌混合和随机屏蔽来提高性能。我们在不同的非成音节元音类型和情绪类别上对所提出的方法进行了评估。实验结果表明，该方法能够很好地解决零弹非源声传输任务。最后，我们进行了烧蚀实验来进一步验证所提出的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

自引率

0.00%

发文量