Pengxu JIANG, Yang YANG, Yue XIE, Cairong ZOU, Qingyun WANG
{"title":"基于波段交互块的双路径卷积神经网络声学场景分类","authors":"Pengxu JIANG, Yang YANG, Yue XIE, Cairong ZOU, Qingyun WANG","doi":"10.1587/transfun.2023eal2056","DOIUrl":null,"url":null,"abstract":"Convolutional neural network (CNN) is widely used in acoustic scene classification (ASC) tasks. In most cases, local convolution is utilized to gather time-frequency information between spectrum nodes. It is challenging to adequately express the non-local link between frequency domains in a finite convolution region. In this paper, we propose a dual-path convolutional neural network based on band interaction block (DCNN-bi) for ASC, with mel-spectrogram as the model's input. We build two parallel CNN paths to learn the high-frequency and low-frequency components of the input feature. Additionally, we have created three band interaction blocks (bi-blocks) to explore the pertinent nodes between various frequency bands, which are connected between two paths. Combining the time-frequency information from two paths, the bi-blocks with three distinct designs acquire non-local information and send it back to the respective paths. The experimental results indicate that the utilization of the bi-block has the potential to improve the initial performance of the CNN substantially. Specifically, when applied to the DCASE 2018 and DCASE 2020 datasets, the CNN exhibited performance improvements of 1.79% and 3.06%, respectively.","PeriodicalId":55003,"journal":{"name":"Ieice Transactions on Fundamentals of Electronics Communications and Computer Sciences","volume":null,"pages":null},"PeriodicalIF":0.4000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Dual-path convolutional neural network based on band interaction block for acoustic scene classification\",\"authors\":\"Pengxu JIANG, Yang YANG, Yue XIE, Cairong ZOU, Qingyun WANG\",\"doi\":\"10.1587/transfun.2023eal2056\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Convolutional neural network (CNN) is widely used in acoustic scene classification (ASC) tasks. In most cases, local convolution is utilized to gather time-frequency information between spectrum nodes. It is challenging to adequately express the non-local link between frequency domains in a finite convolution region. In this paper, we propose a dual-path convolutional neural network based on band interaction block (DCNN-bi) for ASC, with mel-spectrogram as the model's input. We build two parallel CNN paths to learn the high-frequency and low-frequency components of the input feature. Additionally, we have created three band interaction blocks (bi-blocks) to explore the pertinent nodes between various frequency bands, which are connected between two paths. Combining the time-frequency information from two paths, the bi-blocks with three distinct designs acquire non-local information and send it back to the respective paths. The experimental results indicate that the utilization of the bi-block has the potential to improve the initial performance of the CNN substantially. Specifically, when applied to the DCASE 2018 and DCASE 2020 datasets, the CNN exhibited performance improvements of 1.79% and 3.06%, respectively.\",\"PeriodicalId\":55003,\"journal\":{\"name\":\"Ieice Transactions on Fundamentals of Electronics Communications and Computer Sciences\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.4000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ieice Transactions on Fundamentals of Electronics Communications and Computer Sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1587/transfun.2023eal2056\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ieice Transactions on Fundamentals of Electronics Communications and Computer Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1587/transfun.2023eal2056","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
Dual-path convolutional neural network based on band interaction block for acoustic scene classification
Convolutional neural network (CNN) is widely used in acoustic scene classification (ASC) tasks. In most cases, local convolution is utilized to gather time-frequency information between spectrum nodes. It is challenging to adequately express the non-local link between frequency domains in a finite convolution region. In this paper, we propose a dual-path convolutional neural network based on band interaction block (DCNN-bi) for ASC, with mel-spectrogram as the model's input. We build two parallel CNN paths to learn the high-frequency and low-frequency components of the input feature. Additionally, we have created three band interaction blocks (bi-blocks) to explore the pertinent nodes between various frequency bands, which are connected between two paths. Combining the time-frequency information from two paths, the bi-blocks with three distinct designs acquire non-local information and send it back to the respective paths. The experimental results indicate that the utilization of the bi-block has the potential to improve the initial performance of the CNN substantially. Specifically, when applied to the DCASE 2018 and DCASE 2020 datasets, the CNN exhibited performance improvements of 1.79% and 3.06%, respectively.
期刊介绍:
Includes reports on research, developments, and examinations performed by the Society''s members for the specific fields shown in the category list such as detailed below, the contents of which may advance the development of science and industry:
(1) Reports on new theories, experiments with new contents, or extensions of and supplements to conventional theories and experiments.
(2) Reports on development of measurement technology and various applied technologies.
(3) Reports on the planning, design, manufacture, testing, or operation of facilities, machinery, parts, materials, etc.
(4) Presentation of new methods, suggestion of new angles, ideas, systematization, software, or any new facts regarding the above.