Battling with the low-resource condition for snore sound recognition: introducing a meta-learning strategy

IF 2.4 3区 计算机科学 Journal on Audio Speech and Music Processing Pub Date : 2023-10-13 DOI:10.1186/s13636-023-00309-3
Jingtan Li, Mengkai Sun, Zhonghao Zhao, Xingcan Li, Gaigai Li, Chen Wu, Kun Qian, Bin Hu, Yoshiharu Yamamoto, Björn W. Schuller
{"title":"Battling with the low-resource condition for snore sound recognition: introducing a meta-learning strategy","authors":"Jingtan Li, Mengkai Sun, Zhonghao Zhao, Xingcan Li, Gaigai Li, Chen Wu, Kun Qian, Bin Hu, Yoshiharu Yamamoto, Björn W. Schuller","doi":"10.1186/s13636-023-00309-3","DOIUrl":null,"url":null,"abstract":"Abstract Snoring affects 57 % of men, 40 % of women, and 27 % of children in the USA. Besides, snoring is highly correlated with obstructive sleep apnoea (OSA), which is characterised by loud and frequent snoring. OSA is also closely associated with various life-threatening diseases such as sudden cardiac arrest and is regarded as a grave medical ailment. Preliminary studies have shown that in the USA, OSA affects over 34 % of men and 14 % of women. In recent years, polysomnography has increasingly been used to diagnose OSA. However, due to its drawbacks such as being time-consuming and costly, intelligent audio analysis of snoring has emerged as an alternative method. Considering the higher demand for identifying the excitation location of snoring in clinical practice, we utilised the Munich-Passau Snore Sound Corpus (MPSSC) snoring database which classifies the snoring excitation location into four categories. Nonetheless, the problem of small samples remains in the MPSSC database due to factors such as privacy concerns and difficulties in accurate labelling. In fact, accurately labelled medical data that can be used for machine learning is often scarce, especially for rare diseases. In view of this, Model-Agnostic Meta-Learning (MAML), a small sample method based on meta-learning, is used to classify snore signals with less resources in this work. The experimental results indicate that even when using only the ESC-50 dataset (non-snoring sound signals) as the data for meta-training, we are able to achieve an unweighted average recall of 60.2 % on the test dataset after fine-tuning on just 36 instances of snoring from the development part of the MPSSC dataset. While our results only exceed the baseline by 4.4 %, they still demonstrate that even with fine-tuning on a few instances of snoring, our model can outperform the baseline. This implies that the MAML algorithm can effectively tackle the low-resource problem even with limited data resources.","PeriodicalId":49309,"journal":{"name":"Journal on Audio Speech and Music Processing","volume":"87 1","pages":"0"},"PeriodicalIF":2.4000,"publicationDate":"2023-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal on Audio Speech and Music Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s13636-023-00309-3","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Abstract Snoring affects 57 % of men, 40 % of women, and 27 % of children in the USA. Besides, snoring is highly correlated with obstructive sleep apnoea (OSA), which is characterised by loud and frequent snoring. OSA is also closely associated with various life-threatening diseases such as sudden cardiac arrest and is regarded as a grave medical ailment. Preliminary studies have shown that in the USA, OSA affects over 34 % of men and 14 % of women. In recent years, polysomnography has increasingly been used to diagnose OSA. However, due to its drawbacks such as being time-consuming and costly, intelligent audio analysis of snoring has emerged as an alternative method. Considering the higher demand for identifying the excitation location of snoring in clinical practice, we utilised the Munich-Passau Snore Sound Corpus (MPSSC) snoring database which classifies the snoring excitation location into four categories. Nonetheless, the problem of small samples remains in the MPSSC database due to factors such as privacy concerns and difficulties in accurate labelling. In fact, accurately labelled medical data that can be used for machine learning is often scarce, especially for rare diseases. In view of this, Model-Agnostic Meta-Learning (MAML), a small sample method based on meta-learning, is used to classify snore signals with less resources in this work. The experimental results indicate that even when using only the ESC-50 dataset (non-snoring sound signals) as the data for meta-training, we are able to achieve an unweighted average recall of 60.2 % on the test dataset after fine-tuning on just 36 instances of snoring from the development part of the MPSSC dataset. While our results only exceed the baseline by 4.4 %, they still demonstrate that even with fine-tuning on a few instances of snoring, our model can outperform the baseline. This implies that the MAML algorithm can effectively tackle the low-resource problem even with limited data resources.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
解决鼾声识别资源不足的问题:引入元学习策略
在美国,有57%的男性、40%的女性和27%的儿童打鼾。此外,打鼾与阻塞性睡眠呼吸暂停(OSA)高度相关,其特征是大声和频繁的打鼾。阻塞性睡眠呼吸暂停还与各种危及生命的疾病密切相关,如心脏骤停,被认为是一种严重的医学疾病。初步研究表明,在美国,超过34%的男性和14%的女性患有阻塞性睡眠呼吸暂停综合症。近年来,多导睡眠图越来越多地用于OSA的诊断。然而,由于其耗时和昂贵的缺点,智能音频分析打鼾已经成为一种替代方法。考虑到临床实践中对打鼾激发位置识别的更高要求,我们利用慕尼黑-帕绍鼾声语料库(MPSSC)打鼾数据库,将打鼾激发位置分为四类。尽管如此,由于隐私问题和准确标记困难等因素,小样本问题仍然存在于MPSSC数据库中。事实上,可以用于机器学习的准确标记的医疗数据通常是稀缺的,特别是对于罕见疾病。鉴于此,本研究采用基于元学习的小样本方法——模型不可知元学习(Model-Agnostic Meta-Learning, MAML)对资源较少的打鼾信号进行分类。实验结果表明,即使只使用ESC-50数据集(非打鼾声音信号)作为元训练数据,我们也能够在MPSSC数据集开发部分的36个打鼾实例进行微调后,在测试数据集上实现60.2%的未加权平均召回率。虽然我们的结果只超过基线4.4%,但它们仍然表明,即使对一些打鼾的实例进行微调,我们的模型也可以优于基线。这意味着即使在数据资源有限的情况下,MAML算法也能有效地解决低资源问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal on Audio Speech and Music Processing
Journal on Audio Speech and Music Processing Engineering-Electrical and Electronic Engineering
CiteScore
4.10
自引率
4.20%
发文量
28
期刊介绍: The aim of “EURASIP Journal on Audio, Speech, and Music Processing” is to bring together researchers, scientists and engineers working on the theory and applications of the processing of various audio signals, with a specific focus on speech and music. EURASIP Journal on Audio, Speech, and Music Processing will be an interdisciplinary journal for the dissemination of all basic and applied aspects of speech communication and audio processes.
期刊最新文献
A survey of technologies for automatic Dysarthric speech recognition Improving speech recognition systems for the morphologically complex Malayalam language using subword tokens for language modeling Robustness of ad hoc microphone clustering using speaker embeddings: evaluation under realistic and challenging scenarios W2VC: WavLM representation based one-shot voice conversion with gradient reversal distillation and CTC supervision YuYin: a multi-task learning model of multi-modal e-commerce background music recommendation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1