Poster: SlideCNN: Deep Learning for Auditory Spatial Scenes with Limited Annotated Data

2022 IEEE/ACM 7th Symposium on Edge Computing (SEC) Pub Date : 2022-12-01 DOI:10.1109/SEC54971.2022.00044

Wenkai Li, Theo Gueuret, Beiyu Lin

{"title":"Poster: SlideCNN: Deep Learning for Auditory Spatial Scenes with Limited Annotated Data","authors":"Wenkai Li, Theo Gueuret, Beiyu Lin","doi":"10.1109/SEC54971.2022.00044","DOIUrl":null,"url":null,"abstract":"Sound is an important modality to perceive and understand the spatial environment. With the development of digital technology, massive amounts of smart devices in use around the world can collect sound data. Auditory spatial scenes, a spatial environment to understand and distinguish sound, are important to be detected by analyzing sounds collected via those devices. Given limited annotated auditory spatial samples, the current best-performing model can predict an auditory scene with an accuracy of 73%. We propose a novel yet simple Sliding Window based Convolutional Neural Network, SlideCNN, without manually designing features. SlideCNN leverages windowing operation to increase samples for limited annotation problems and improves the prediction accuracy by over 12% compared to the current best-performing models. It can detect real-life indoor and outdoor scenes with a 85% accuracy. The results will enhance practical applications of ML to analyze auditory scenes with limited annotated samples. It will further improve the recognition of environments that may potentially influence the safety of people, especially people with hearing aids and cochlear implant processors.","PeriodicalId":364062,"journal":{"name":"2022 IEEE/ACM 7th Symposium on Edge Computing (SEC)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM 7th Symposium on Edge Computing (SEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SEC54971.2022.00044","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Sound is an important modality to perceive and understand the spatial environment. With the development of digital technology, massive amounts of smart devices in use around the world can collect sound data. Auditory spatial scenes, a spatial environment to understand and distinguish sound, are important to be detected by analyzing sounds collected via those devices. Given limited annotated auditory spatial samples, the current best-performing model can predict an auditory scene with an accuracy of 73%. We propose a novel yet simple Sliding Window based Convolutional Neural Network, SlideCNN, without manually designing features. SlideCNN leverages windowing operation to increase samples for limited annotation problems and improves the prediction accuracy by over 12% compared to the current best-performing models. It can detect real-life indoor and outdoor scenes with a 85% accuracy. The results will enhance practical applications of ML to analyze auditory scenes with limited annotated samples. It will further improve the recognition of environments that may potentially influence the safety of people, especially people with hearing aids and cochlear implant processors.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于有限注释数据的听觉空间场景的深度学习

声音是感知和认识空间环境的重要方式。随着数字技术的发展，世界各地使用的大量智能设备都可以收集声音数据。听觉空间场景是一种理解和区分声音的空间环境，通过分析这些设备收集的声音来检测听觉空间场景非常重要。给定有限的注释听觉空间样本，目前表现最好的模型可以以73%的准确率预测听觉场景。我们提出了一种新颖而简单的基于滑动窗口的卷积神经网络，SlideCNN，无需手动设计特征。SlideCNN利用开窗操作来增加有限注释问题的样本，与目前性能最好的模型相比，预测精度提高了12%以上。它可以以85%的准确率检测真实的室内和室外场景。该结果将增强ML在有限注释样本中分析听觉场景的实际应用。它将进一步提高对可能影响人们安全的环境的认识，特别是佩戴助听器和人工耳蜗处理器的人。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 IEEE/ACM 7th Symposium on Edge Computing (SEC)

自引率

0.00%

发文量

期刊最新文献

Opportunities for Optimizing the Container Runtime Poster: EdgeShell - A language for composing edge applications Quantum Text Encoding for Classification Tasks Scaling Vehicle Routing Problem Solvers with QUBO-based Specialized Hardware FLiCR: A Fast and Lightweight LiDAR Point Cloud Compression Based on Lossy RI