Flexible parametric implantation of voicing in whispered speech under scarce training data

2020 28th European Signal Processing Conference (EUSIPCO) Pub Date : 2021-01-24 DOI:10.23919/Eusipco47968.2020.9287684

João Silva, Marco Oliveira, Aníbal J. S. Ferreira

引用次数: 2

Abstract

Whispered-voice to normal-voice conversion is typically achieved using codec-based analysis and re-synthesis, using statistical conversion of important spectral and prosodic features, or using data-driven end-to-end signal conversion. These approaches are however highly constrained by the architecture of the codec, the statistical projection, or the size and quality of the training data. In this paper, we presume direct implantation of voiced phonemes in whispered speech and we focus on fully flexible parametric models that i) can be independently controlled, ii) synthesize natural and linguistically correct voiced phonemes, iii) preserve idiosyncratic characteristics of a given speaker, and iv) are amenable to co-articulation effects through simple model interpolation. We use natural spoken and sung vowels to illustrate these capabilities in a signal modeling and re-synthesis process where spectral magnitude, phase structure, F0 contour and sound morphing can be independently controlled in arbitrary ways.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

训练数据稀缺的耳语语音柔性参数植入

耳语语音到正常语音的转换通常使用基于编解码器的分析和重新合成，使用重要频谱和韵律特征的统计转换，或使用数据驱动的端到端信号转换来实现。然而，这些方法受到编解码器的体系结构、统计投影或训练数据的大小和质量的高度限制。在本文中，我们假设在低声语音中直接植入发声音素，并将重点放在完全灵活的参数模型上，这些模型i)可以独立控制，ii)合成自然和语言上正确的发声音素，iii)保留给定说话者的特质特征，iv)可以通过简单的模型插值来适应协同发音效应。我们使用自然的口语和歌唱元音来说明信号建模和重新合成过程中的这些能力，其中频谱幅度，相位结构，F0轮廓和声音变形可以以任意方式独立控制。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊