从混合语音中发现少量关键词

arXiv - CS - Sound Pub Date : 2024-07-05 DOI:arxiv-2407.06078

Junming Yuan, Ying Shi, LanTian Li, Dong Wang, Askar Hamdulla

{"title":"从混合语音中发现少量关键词","authors":"Junming Yuan, Ying Shi, LanTian Li, Dong Wang, Askar Hamdulla","doi":"arxiv-2407.06078","DOIUrl":null,"url":null,"abstract":"Few-shot keyword spotting (KWS) aims to detect unknown keywords with limited\ntraining samples. A commonly used approach is the pre-training and fine-tuning\nframework. While effective in clean conditions, this approach struggles with\nmixed keyword spotting -- simultaneously detecting multiple keywords blended in\nan utterance, which is crucial in real-world applications. Previous research\nhas proposed a Mix-Training (MT) approach to solve the problem, however, it has\nnever been tested in the few-shot scenario. In this paper, we investigate the\npossibility of using MT and other relevant methods to solve the two practical\nchallenges together: few-shot and mixed speech. Experiments conducted on the\nLibriSpeech and Google Speech Command corpora demonstrate that MT is highly\neffective on this task when employed in either the pre-training phase or the\nfine-tuning phase. Moreover, combining SSL-based large-scale pre-training\n(HuBert) and MT fine-tuning yields very strong results in all the test\nconditions.","PeriodicalId":501178,"journal":{"name":"arXiv - CS - Sound","volume":"18 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Few-Shot Keyword Spotting from Mixed Speech\",\"authors\":\"Junming Yuan, Ying Shi, LanTian Li, Dong Wang, Askar Hamdulla\",\"doi\":\"arxiv-2407.06078\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Few-shot keyword spotting (KWS) aims to detect unknown keywords with limited\\ntraining samples. A commonly used approach is the pre-training and fine-tuning\\nframework. While effective in clean conditions, this approach struggles with\\nmixed keyword spotting -- simultaneously detecting multiple keywords blended in\\nan utterance, which is crucial in real-world applications. Previous research\\nhas proposed a Mix-Training (MT) approach to solve the problem, however, it has\\nnever been tested in the few-shot scenario. In this paper, we investigate the\\npossibility of using MT and other relevant methods to solve the two practical\\nchallenges together: few-shot and mixed speech. Experiments conducted on the\\nLibriSpeech and Google Speech Command corpora demonstrate that MT is highly\\neffective on this task when employed in either the pre-training phase or the\\nfine-tuning phase. Moreover, combining SSL-based large-scale pre-training\\n(HuBert) and MT fine-tuning yields very strong results in all the test\\nconditions.\",\"PeriodicalId\":501178,\"journal\":{\"name\":\"arXiv - CS - Sound\",\"volume\":\"18 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Sound\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2407.06078\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Sound","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.06078","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

少量关键词抽取（KWS）旨在利用有限的训练样本检测未知关键词。一种常用的方法是预训练和微调框架。这种方法虽然在干净的条件下很有效，但在混合关键词检测方面却很吃力，即同时检测语篇中混合的多个关键词，这在实际应用中至关重要。之前的研究提出了一种混合训练（MT）方法来解决这个问题，但是这种方法从未在少量语料的情况下进行过测试。在本文中，我们研究了使用 MT 和其他相关方法一并解决两个实际挑战的可能性：少发语音和混合语音。在 LibriSpeech 和 Google Speech Command 语料库上进行的实验表明，无论是在预训练阶段还是在微调阶段，MT 在这项任务中都非常有效。此外，将基于 SSL 的大规模预训练（HuBert）与 MT 微调相结合，在所有测试条件下都能获得非常出色的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Few-Shot Keyword Spotting from Mixed Speech

Few-shot keyword spotting (KWS) aims to detect unknown keywords with limited training samples. A commonly used approach is the pre-training and fine-tuning framework. While effective in clean conditions, this approach struggles with mixed keyword spotting -- simultaneously detecting multiple keywords blended in an utterance, which is crucial in real-world applications. Previous research has proposed a Mix-Training (MT) approach to solve the problem, however, it has never been tested in the few-shot scenario. In this paper, we investigate the possibility of using MT and other relevant methods to solve the two practical challenges together: few-shot and mixed speech. Experiments conducted on the LibriSpeech and Google Speech Command corpora demonstrate that MT is highly effective on this task when employed in either the pre-training phase or the fine-tuning phase. Moreover, combining SSL-based large-scale pre-training (HuBert) and MT fine-tuning yields very strong results in all the test conditions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Sound

自引率

0.00%

发文量