{"title":"Time-frequency Performance Study on Urban Sound Classification with Convolutional Neural Network","authors":"H. Shu, Ying Song, Huan Zhou","doi":"10.1109/TENCON.2018.8650428","DOIUrl":null,"url":null,"abstract":"Convolutional neural network (ConvNet) is a class of deep feed-forward neural network which exploits the strong spatially local correlation in natural images. It achieves successful performance in visual analyzing area. Recently, ConvNet has been employed in acoustic processing area and been proved to be able to learn the spectro-temporal pattern of sound and differential them for the classification purpose. In this manuscript, the time-frequency resolution of the input sound is studied for their efficiency in the classification accuracy when ConvNet is adopted. Simulation results shows that the data augment solution, which is called multi-width frequency-delta, presents little contribution for the performance improvement when the network is carefully designed. In addition, a suitable temporal resolution in acoustic sound segmentation can achieve good classification effect.","PeriodicalId":132900,"journal":{"name":"TENCON 2018 - 2018 IEEE Region 10 Conference","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"TENCON 2018 - 2018 IEEE Region 10 Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TENCON.2018.8650428","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Convolutional neural network (ConvNet) is a class of deep feed-forward neural network which exploits the strong spatially local correlation in natural images. It achieves successful performance in visual analyzing area. Recently, ConvNet has been employed in acoustic processing area and been proved to be able to learn the spectro-temporal pattern of sound and differential them for the classification purpose. In this manuscript, the time-frequency resolution of the input sound is studied for their efficiency in the classification accuracy when ConvNet is adopted. Simulation results shows that the data augment solution, which is called multi-width frequency-delta, presents little contribution for the performance improvement when the network is carefully designed. In addition, a suitable temporal resolution in acoustic sound segmentation can achieve good classification effect.