SALT: Standardized Audio event Label Taxonomy

Paraskevas StamatiadisIDS, S2A, LTCI, Michel OlveraIDS, S2A, LTCI, Slim EssidIDS, S2A, LTCI
{"title":"SALT: Standardized Audio event Label Taxonomy","authors":"Paraskevas StamatiadisIDS, S2A, LTCI, Michel OlveraIDS, S2A, LTCI, Slim EssidIDS, S2A, LTCI","doi":"arxiv-2409.11746","DOIUrl":null,"url":null,"abstract":"Machine listening systems often rely on fixed taxonomies to organize and\nlabel audio data, key for training and evaluating deep neural networks (DNNs)\nand other supervised algorithms. However, such taxonomies face significant\nconstraints: they are composed of application-dependent predefined categories,\nwhich hinders the integration of new or varied sounds, and exhibits limited\ncross-dataset compatibility due to inconsistent labeling standards. To overcome\nthese limitations, we introduce SALT: Standardized Audio event Label Taxonomy.\nBuilding upon the hierarchical structure of AudioSet's ontology, our taxonomy\nextends and standardizes labels across 24 publicly available environmental\nsound datasets, allowing the mapping of class labels from diverse datasets to a\nunified system. Our proposal comes with a new Python package designed for\nnavigating and utilizing this taxonomy, easing cross-dataset label searching\nand hierarchical exploration. Notably, our package allows effortless data\naggregation from diverse sources, hence easy experimentation with combined\ndatasets.","PeriodicalId":501284,"journal":{"name":"arXiv - EE - Audio and Speech Processing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Audio and Speech Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11746","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Machine listening systems often rely on fixed taxonomies to organize and label audio data, key for training and evaluating deep neural networks (DNNs) and other supervised algorithms. However, such taxonomies face significant constraints: they are composed of application-dependent predefined categories, which hinders the integration of new or varied sounds, and exhibits limited cross-dataset compatibility due to inconsistent labeling standards. To overcome these limitations, we introduce SALT: Standardized Audio event Label Taxonomy. Building upon the hierarchical structure of AudioSet's ontology, our taxonomy extends and standardizes labels across 24 publicly available environmental sound datasets, allowing the mapping of class labels from diverse datasets to a unified system. Our proposal comes with a new Python package designed for navigating and utilizing this taxonomy, easing cross-dataset label searching and hierarchical exploration. Notably, our package allows effortless data aggregation from diverse sources, hence easy experimentation with combined datasets.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
SALT:标准化音频事件标签分类法
机器听音系统通常依靠固定的分类标准来组织和标记音频数据,这是训练和评估深度神经网络(DNN)和其他监督算法的关键。然而,这些分类标准面临着很大的限制:它们由依赖于应用的预定义类别组成,这阻碍了新声音或各种声音的整合,而且由于标签标准不一致,跨数据集的兼容性也很有限。为了克服这些限制,我们引入了 SALT:标准化音频事件标签分类法。在 AudioSet 本体的分层结构基础上,我们的分类法扩展并标准化了 24 个公开可用的环境声音数据集的标签,允许将不同数据集的类标签映射到统一的系统中。我们的提案还附带了一个新的 Python 软件包,该软件包专为导航和使用该分类法而设计,可简化跨数据集标签搜索和分层探索。值得注意的是,我们的软件包可以毫不费力地对不同来源的数据进行聚合,从而轻松地对组合数据集进行实验。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Exploring an Inter-Pausal Unit (IPU) based Approach for Indic End-to-End TTS Systems Conformal Prediction for Manifold-based Source Localization with Gaussian Processes Insights into the Incorporation of Signal Information in Binaural Signal Matching with Wearable Microphone Arrays Dense-TSNet: Dense Connected Two-Stage Structure for Ultra-Lightweight Speech Enhancement Low Frame-rate Speech Codec: a Codec Designed for Fast High-quality Speech LLM Training and Inference
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1