{"title":"面向波形信号的端到端可解释卷积神经网络","authors":"Linh Vu, Thu Tran, Wern-Han Lim, Raphael Phan","doi":"arxiv-2405.01815","DOIUrl":null,"url":null,"abstract":"This paper introduces a novel convolutional neural networks (CNN) framework\ntailored for end-to-end audio deep learning models, presenting advancements in\nefficiency and explainability. By benchmarking experiments on three standard\nspeech emotion recognition datasets with five-fold cross-validation, our\nframework outperforms Mel spectrogram features by up to seven percent. It can\npotentially replace the Mel-Frequency Cepstral Coefficients (MFCC) while\nremaining lightweight. Furthermore, we demonstrate the efficiency and\ninterpretability of the front-end layer using the PhysioNet Heart Sound\nDatabase, illustrating its ability to handle and capture intricate long\nwaveform patterns. Our contributions offer a portable solution for building\nefficient and interpretable models for raw waveform data.","PeriodicalId":501178,"journal":{"name":"arXiv - CS - Sound","volume":"111 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Toward end-to-end interpretable convolutional neural networks for waveform signals\",\"authors\":\"Linh Vu, Thu Tran, Wern-Han Lim, Raphael Phan\",\"doi\":\"arxiv-2405.01815\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper introduces a novel convolutional neural networks (CNN) framework\\ntailored for end-to-end audio deep learning models, presenting advancements in\\nefficiency and explainability. By benchmarking experiments on three standard\\nspeech emotion recognition datasets with five-fold cross-validation, our\\nframework outperforms Mel spectrogram features by up to seven percent. It can\\npotentially replace the Mel-Frequency Cepstral Coefficients (MFCC) while\\nremaining lightweight. Furthermore, we demonstrate the efficiency and\\ninterpretability of the front-end layer using the PhysioNet Heart Sound\\nDatabase, illustrating its ability to handle and capture intricate long\\nwaveform patterns. Our contributions offer a portable solution for building\\nefficient and interpretable models for raw waveform data.\",\"PeriodicalId\":501178,\"journal\":{\"name\":\"arXiv - CS - Sound\",\"volume\":\"111 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Sound\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2405.01815\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Sound","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2405.01815","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Toward end-to-end interpretable convolutional neural networks for waveform signals
This paper introduces a novel convolutional neural networks (CNN) framework
tailored for end-to-end audio deep learning models, presenting advancements in
efficiency and explainability. By benchmarking experiments on three standard
speech emotion recognition datasets with five-fold cross-validation, our
framework outperforms Mel spectrogram features by up to seven percent. It can
potentially replace the Mel-Frequency Cepstral Coefficients (MFCC) while
remaining lightweight. Furthermore, we demonstrate the efficiency and
interpretability of the front-end layer using the PhysioNet Heart Sound
Database, illustrating its ability to handle and capture intricate long
waveform patterns. Our contributions offer a portable solution for building
efficient and interpretable models for raw waveform data.