{"title":"Poster: SlideCNN: Deep Learning for Auditory Spatial Scenes with Limited Annotated Data","authors":"Wenkai Li, Theo Gueuret, Beiyu Lin","doi":"10.1109/SEC54971.2022.00044","DOIUrl":null,"url":null,"abstract":"Sound is an important modality to perceive and understand the spatial environment. With the development of digital technology, massive amounts of smart devices in use around the world can collect sound data. Auditory spatial scenes, a spatial environment to understand and distinguish sound, are important to be detected by analyzing sounds collected via those devices. Given limited annotated auditory spatial samples, the current best-performing model can predict an auditory scene with an accuracy of 73%. We propose a novel yet simple Sliding Window based Convolutional Neural Network, SlideCNN, without manually designing features. SlideCNN leverages windowing operation to increase samples for limited annotation problems and improves the prediction accuracy by over 12% compared to the current best-performing models. It can detect real-life indoor and outdoor scenes with a 85% accuracy. The results will enhance practical applications of ML to analyze auditory scenes with limited annotated samples. It will further improve the recognition of environments that may potentially influence the safety of people, especially people with hearing aids and cochlear implant processors.","PeriodicalId":364062,"journal":{"name":"2022 IEEE/ACM 7th Symposium on Edge Computing (SEC)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM 7th Symposium on Edge Computing (SEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SEC54971.2022.00044","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Sound is an important modality to perceive and understand the spatial environment. With the development of digital technology, massive amounts of smart devices in use around the world can collect sound data. Auditory spatial scenes, a spatial environment to understand and distinguish sound, are important to be detected by analyzing sounds collected via those devices. Given limited annotated auditory spatial samples, the current best-performing model can predict an auditory scene with an accuracy of 73%. We propose a novel yet simple Sliding Window based Convolutional Neural Network, SlideCNN, without manually designing features. SlideCNN leverages windowing operation to increase samples for limited annotation problems and improves the prediction accuracy by over 12% compared to the current best-performing models. It can detect real-life indoor and outdoor scenes with a 85% accuracy. The results will enhance practical applications of ML to analyze auditory scenes with limited annotated samples. It will further improve the recognition of environments that may potentially influence the safety of people, especially people with hearing aids and cochlear implant processors.