GIO:高噪声环境下音色跟踪的方法

Proceedings of the 2022 International Conference on Multimedia Retrieval Pub Date : 2022-06-27 DOI:10.1145/3512527.3531393

Xiaoheng Sun, Xia Liang, Qiqi He, Bilei Zhu, Zejun Ma

{"title":"GIO:高噪声环境下音色跟踪的方法","authors":"Xiaoheng Sun, Xia Liang, Qiqi He, Bilei Zhu, Zejun Ma","doi":"10.1145/3512527.3531393","DOIUrl":null,"url":null,"abstract":"As one of the fundamental tasks in music and speech signal processing, pitch tracking has been attracting attention for decades. While a human can focus on the voiced pitch even in highly noisy environments, most existing automatic pitch tracking systems show unsatisfactory performance encountering noise. To mimic human auditory, a data-driven model named GIO is proposed in this paper, in which timbre information is introduced to guide pitch tracking. The proposed model takes two inputs: a short audio segment to extract pitch from and a timbre embedding derived from the speaker's or singer's voice. In experiments, we use a music artist classification model to extract timbre embedding vectors. A dual-branch structure and a two-step training method are designed to enable the model to predict voice presence. The experimental results show that the proposed model gains a significant improvement in noise robustness and outperforms existing state-of-the-art methods with fewer parameters.","PeriodicalId":179895,"journal":{"name":"Proceedings of the 2022 International Conference on Multimedia Retrieval","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"GIO: A Timbre-informed Approach for Pitch Tracking in Highly Noisy Environments\",\"authors\":\"Xiaoheng Sun, Xia Liang, Qiqi He, Bilei Zhu, Zejun Ma\",\"doi\":\"10.1145/3512527.3531393\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As one of the fundamental tasks in music and speech signal processing, pitch tracking has been attracting attention for decades. While a human can focus on the voiced pitch even in highly noisy environments, most existing automatic pitch tracking systems show unsatisfactory performance encountering noise. To mimic human auditory, a data-driven model named GIO is proposed in this paper, in which timbre information is introduced to guide pitch tracking. The proposed model takes two inputs: a short audio segment to extract pitch from and a timbre embedding derived from the speaker's or singer's voice. In experiments, we use a music artist classification model to extract timbre embedding vectors. A dual-branch structure and a two-step training method are designed to enable the model to predict voice presence. The experimental results show that the proposed model gains a significant improvement in noise robustness and outperforms existing state-of-the-art methods with fewer parameters.\",\"PeriodicalId\":179895,\"journal\":{\"name\":\"Proceedings of the 2022 International Conference on Multimedia Retrieval\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2022 International Conference on Multimedia Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3512527.3531393\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 International Conference on Multimedia Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3512527.3531393","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

音高跟踪作为音乐和语音信号处理的基础任务之一，几十年来一直受到人们的关注。即使在高噪声环境中，人类也可以专注于发声音高，但大多数现有的自动音高跟踪系统在遇到噪声时表现不佳。为了模拟人类听觉，本文提出了一种数据驱动的模型GIO，该模型引入音色信息来指导音高跟踪。该模型采用两个输入:一个用于提取音高的短音频片段，以及一个从说话者或歌手的声音中提取的音色嵌入。在实验中，我们使用音乐艺术家分类模型提取音色嵌入向量。设计了一种双分支结构和两步训练方法，使模型能够预测语音存在。实验结果表明，该模型在噪声鲁棒性方面有显著提高，并且在参数较少的情况下优于现有的先进方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

GIO: A Timbre-informed Approach for Pitch Tracking in Highly Noisy Environments

As one of the fundamental tasks in music and speech signal processing, pitch tracking has been attracting attention for decades. While a human can focus on the voiced pitch even in highly noisy environments, most existing automatic pitch tracking systems show unsatisfactory performance encountering noise. To mimic human auditory, a data-driven model named GIO is proposed in this paper, in which timbre information is introduced to guide pitch tracking. The proposed model takes two inputs: a short audio segment to extract pitch from and a timbre embedding derived from the speaker's or singer's voice. In experiments, we use a music artist classification model to extract timbre embedding vectors. A dual-branch structure and a two-step training method are designed to enable the model to predict voice presence. The experimental results show that the proposed model gains a significant improvement in noise robustness and outperforms existing state-of-the-art methods with fewer parameters.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2022 International Conference on Multimedia Retrieval

自引率

0.00%

发文量