GIO: A Timbre-informed Approach for Pitch Tracking in Highly Noisy Environments

Proceedings of the 2022 International Conference on Multimedia Retrieval Pub Date : 2022-06-27 DOI:10.1145/3512527.3531393

Xiaoheng Sun, Xia Liang, Qiqi He, Bilei Zhu, Zejun Ma

引用次数: 3

Abstract

As one of the fundamental tasks in music and speech signal processing, pitch tracking has been attracting attention for decades. While a human can focus on the voiced pitch even in highly noisy environments, most existing automatic pitch tracking systems show unsatisfactory performance encountering noise. To mimic human auditory, a data-driven model named GIO is proposed in this paper, in which timbre information is introduced to guide pitch tracking. The proposed model takes two inputs: a short audio segment to extract pitch from and a timbre embedding derived from the speaker's or singer's voice. In experiments, we use a music artist classification model to extract timbre embedding vectors. A dual-branch structure and a two-step training method are designed to enable the model to predict voice presence. The experimental results show that the proposed model gains a significant improvement in noise robustness and outperforms existing state-of-the-art methods with fewer parameters.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

GIO:高噪声环境下音色跟踪的方法

音高跟踪作为音乐和语音信号处理的基础任务之一，几十年来一直受到人们的关注。即使在高噪声环境中，人类也可以专注于发声音高，但大多数现有的自动音高跟踪系统在遇到噪声时表现不佳。为了模拟人类听觉，本文提出了一种数据驱动的模型GIO，该模型引入音色信息来指导音高跟踪。该模型采用两个输入:一个用于提取音高的短音频片段，以及一个从说话者或歌手的声音中提取的音色嵌入。在实验中，我们使用音乐艺术家分类模型提取音色嵌入向量。设计了一种双分支结构和两步训练方法，使模型能够预测语音存在。实验结果表明，该模型在噪声鲁棒性方面有显著提高，并且在参数较少的情况下优于现有的先进方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 2022 International Conference on Multimedia Retrieval

自引率

0.00%

发文量