{"title":"Investigating the Effective Dynamic Information of Spectral Shapes for Audio Classification","authors":"Liangwei Chen;Xiren Zhou;Qiuju Chen;Fang Xiong;Huanhuan Chen","doi":"10.1109/TMM.2024.3521837","DOIUrl":null,"url":null,"abstract":"The spectral shape holds crucial information for Audio Classification (AC), encompassing the spectrum's envelope, details, and dynamic changes over time. Conventional methods utilize cepstral coefficients for spectral shape description but overlook its variation details. Deep-learning approaches capture some dynamics but demand substantial training or fine-tuning resources. The Learning in the Model Space (LMS) framework precisely captures the dynamic information of temporal data by utilizing model fitting, even when computational resources and data are limited. However, applying LMS to audio faces challenges: 1) The high sampling rate of audio hinders efficient data fitting and capturing of dynamic information. 2) The Dynamic Information of Partial Spectral Shapes (DIPSS) may enhance classification, as only specific spectral shapes are relevant for AC. This paper extends an AC framework called Effective Dynamic Information Capture (EDIC) to tackle the above issues. EDIC constructs Mel-Frequency Cepstral Coefficients (MFCC) sequences within different dimensional intervals as the fitted data, which not only reduces the number of sequence sampling points but can also describe the change of the spectral shape in different parts over time. EDIC enables us to implement a topology-based selection algorithm in the model space, selecting effective DIPSS for the current AC task. The performance on three tasks confirms the effectiveness of EDIC.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"1114-1126"},"PeriodicalIF":8.4000,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10814068/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
The spectral shape holds crucial information for Audio Classification (AC), encompassing the spectrum's envelope, details, and dynamic changes over time. Conventional methods utilize cepstral coefficients for spectral shape description but overlook its variation details. Deep-learning approaches capture some dynamics but demand substantial training or fine-tuning resources. The Learning in the Model Space (LMS) framework precisely captures the dynamic information of temporal data by utilizing model fitting, even when computational resources and data are limited. However, applying LMS to audio faces challenges: 1) The high sampling rate of audio hinders efficient data fitting and capturing of dynamic information. 2) The Dynamic Information of Partial Spectral Shapes (DIPSS) may enhance classification, as only specific spectral shapes are relevant for AC. This paper extends an AC framework called Effective Dynamic Information Capture (EDIC) to tackle the above issues. EDIC constructs Mel-Frequency Cepstral Coefficients (MFCC) sequences within different dimensional intervals as the fitted data, which not only reduces the number of sequence sampling points but can also describe the change of the spectral shape in different parts over time. EDIC enables us to implement a topology-based selection algorithm in the model space, selecting effective DIPSS for the current AC task. The performance on three tasks confirms the effectiveness of EDIC.
期刊介绍:
The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.