Thanh Tran, Kien Bui Huy, Nhat Truong Pham, M. Carratù, C. Liguori, J. Lundgren
{"title":"将声音分成STFT帧,消除声音分类中的噪声帧","authors":"Thanh Tran, Kien Bui Huy, Nhat Truong Pham, M. Carratù, C. Liguori, J. Lundgren","doi":"10.1109/SSCI50451.2021.9660125","DOIUrl":null,"url":null,"abstract":"Sounds always contain acoustic noise and background noise that affects the accuracy of the sound classification system. Hence, suppression of noise in the sound can improve the robustness of the sound classification model. This paper investigated a sound separation technique that separates the input sound into many overlapped-content Short-Time Fourier Transform (STFT) frames. Our approach is different from the traditional STFT conversion method, which converts each sound into a single STFT image. Contradictory, separating the sound into many STFT frames improves model prediction accuracy by increasing variability in the data and therefore learning from that variability. These separated frames are saved as images and then labeled manually as clean and noisy frames which are then fed into transfer learning convolutional neural networks (CNNs) for the classification task. The pre-trained CNN architectures that learn from these frames become robust against the noise. The experimental results show that the proposed approach is robust against noise and achieves 94.14% in terms of classifying 21 classes including 20 classes of sound events and a noisy class. An open-source repository of the proposed method and results is available at https://github.com/nhattruongpham/soundSepsound.","PeriodicalId":255763,"journal":{"name":"2021 IEEE Symposium Series on Computational Intelligence (SSCI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Separate Sound into STFT Frames to Eliminate Sound Noise Frames in Sound Classification\",\"authors\":\"Thanh Tran, Kien Bui Huy, Nhat Truong Pham, M. Carratù, C. Liguori, J. Lundgren\",\"doi\":\"10.1109/SSCI50451.2021.9660125\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sounds always contain acoustic noise and background noise that affects the accuracy of the sound classification system. Hence, suppression of noise in the sound can improve the robustness of the sound classification model. This paper investigated a sound separation technique that separates the input sound into many overlapped-content Short-Time Fourier Transform (STFT) frames. Our approach is different from the traditional STFT conversion method, which converts each sound into a single STFT image. Contradictory, separating the sound into many STFT frames improves model prediction accuracy by increasing variability in the data and therefore learning from that variability. These separated frames are saved as images and then labeled manually as clean and noisy frames which are then fed into transfer learning convolutional neural networks (CNNs) for the classification task. The pre-trained CNN architectures that learn from these frames become robust against the noise. The experimental results show that the proposed approach is robust against noise and achieves 94.14% in terms of classifying 21 classes including 20 classes of sound events and a noisy class. An open-source repository of the proposed method and results is available at https://github.com/nhattruongpham/soundSepsound.\",\"PeriodicalId\":255763,\"journal\":{\"name\":\"2021 IEEE Symposium Series on Computational Intelligence (SSCI)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE Symposium Series on Computational Intelligence (SSCI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SSCI50451.2021.9660125\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Symposium Series on Computational Intelligence (SSCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SSCI50451.2021.9660125","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Separate Sound into STFT Frames to Eliminate Sound Noise Frames in Sound Classification
Sounds always contain acoustic noise and background noise that affects the accuracy of the sound classification system. Hence, suppression of noise in the sound can improve the robustness of the sound classification model. This paper investigated a sound separation technique that separates the input sound into many overlapped-content Short-Time Fourier Transform (STFT) frames. Our approach is different from the traditional STFT conversion method, which converts each sound into a single STFT image. Contradictory, separating the sound into many STFT frames improves model prediction accuracy by increasing variability in the data and therefore learning from that variability. These separated frames are saved as images and then labeled manually as clean and noisy frames which are then fed into transfer learning convolutional neural networks (CNNs) for the classification task. The pre-trained CNN architectures that learn from these frames become robust against the noise. The experimental results show that the proposed approach is robust against noise and achieves 94.14% in terms of classifying 21 classes including 20 classes of sound events and a noisy class. An open-source repository of the proposed method and results is available at https://github.com/nhattruongpham/soundSepsound.