{"title":"Multi-LCNN: A Hybrid Neural Network Based on Integrated Time-Frequency Characteristics for Acoustic Scene Classification","authors":"Jin Lei, Changjian Wang, Boqing Zhu, Q. Lv, Zhen Huang, Yuxing Peng","doi":"10.1109/ICTAI.2018.00019","DOIUrl":null,"url":null,"abstract":"Acoustic scene classification (ASC) is an important task in audio signal processing and can be useful in many real-world applications. Recently, several deep neural network models have been proposed for ASC, such as LSTMs based on temporal analysis and CNNs based on frequency spectrum, as well as hybrid models of LSTM and CNN to further improve classification performance. However, existing hybrid models fail to properly preserve the temporal information when transferring data between different models. In this work, we first analyze the cause of such temporal information loss. We then propose Multi-LCNN, a new hybrid model with two important mechanisms: (1) a LCNN architecture to effectively preserve temporal information; and (2) a multi-channel feature fusion mechanism (MCFF) that combines enhanced temporal information and frequency spectrogram information to learn highly integrated and discriminative features for ASC. Evaluations on the TUT ASC 2016 dataset show that our model can achieve an improvement of 10.23% over the baseline method, and is currently the best-performing end-to-end model on this dataset.","PeriodicalId":254686,"journal":{"name":"2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTAI.2018.00019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Acoustic scene classification (ASC) is an important task in audio signal processing and can be useful in many real-world applications. Recently, several deep neural network models have been proposed for ASC, such as LSTMs based on temporal analysis and CNNs based on frequency spectrum, as well as hybrid models of LSTM and CNN to further improve classification performance. However, existing hybrid models fail to properly preserve the temporal information when transferring data between different models. In this work, we first analyze the cause of such temporal information loss. We then propose Multi-LCNN, a new hybrid model with two important mechanisms: (1) a LCNN architecture to effectively preserve temporal information; and (2) a multi-channel feature fusion mechanism (MCFF) that combines enhanced temporal information and frequency spectrogram information to learn highly integrated and discriminative features for ASC. Evaluations on the TUT ASC 2016 dataset show that our model can achieve an improvement of 10.23% over the baseline method, and is currently the best-performing end-to-end model on this dataset.