Toward end-to-end interpretable convolutional neural networks for waveform signals

arXiv - CS - Sound Pub Date : 2024-05-03 DOI:arxiv-2405.01815

Linh Vu, Thu Tran, Wern-Han Lim, Raphael Phan

引用次数: 0

Abstract

This paper introduces a novel convolutional neural networks (CNN) framework tailored for end-to-end audio deep learning models, presenting advancements in efficiency and explainability. By benchmarking experiments on three standard speech emotion recognition datasets with five-fold cross-validation, our framework outperforms Mel spectrogram features by up to seven percent. It can potentially replace the Mel-Frequency Cepstral Coefficients (MFCC) while remaining lightweight. Furthermore, we demonstrate the efficiency and interpretability of the front-end layer using the PhysioNet Heart Sound Database, illustrating its ability to handle and capture intricate long waveform patterns. Our contributions offer a portable solution for building efficient and interpretable models for raw waveform data.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

面向波形信号的端到端可解释卷积神经网络

本文介绍了一种新颖的卷积神经网络（CNN）框架，该框架专为端到端音频深度学习模型而设计，在低效率和可解释性方面取得了进步。通过在三个标准语音情感识别数据集上进行五倍交叉验证的基准实验，我们的框架优于梅尔频谱图特征达 7%。它有可能取代梅尔频率倒频谱系数（MFCC），同时保持轻量级。此外，我们还利用 PhysioNet 心音数据库展示了前端层的效率和可解释性，说明它有能力处理和捕捉复杂的长波形模式。我们的贡献为原始波形数据建立高效、可解释的模型提供了一种便携式解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - CS - Sound

自引率

0.00%

发文量