Fourier Hilbert: The input transformation to enhance CNN models for speech emotion recognition

Cognitive Robotics Pub Date : 2024-01-01 DOI:10.1016/j.cogr.2024.11.002

Bao Long Ly

{"title":"Fourier Hilbert: The input transformation to enhance CNN models for speech emotion recognition","authors":"Bao Long Ly","doi":"10.1016/j.cogr.2024.11.002","DOIUrl":null,"url":null,"abstract":"<div><div>Signal processing in general, and speech emotion recognition in particular, have long been familiar Artificial Intelligence (AI) tasks. With the explosion of deep learning, CNN models are used more frequently, accompanied by the emergence of many signal transformations. However, these methods often require significant hardware and runtime. In an effort to address these issues, we analyze and learn from existing transformations, leading us to propose a new method: Fourier Hilbert Transformation (FHT). In general, this method applies the Hilbert curve to Fourier images. The resulting images are small and dense, which is a shape well-suited to the CNN architecture. Additionally, the better distribution of information on the image allows the filters to fully utilize their power. These points support the argument that FHT provides an optimal input for CNN. Experiments conducted on popular datasets yielded promising results. FHT saves a large amount of hardware usage and runtime while maintaining high performance, even offers greater stability compared to existing methods. This opens up opportunities for deploying signal processing tasks on real-time systems with limited hardware.</div></div>","PeriodicalId":100288,"journal":{"name":"Cognitive Robotics","volume":"4 ","pages":"Pages 228-236"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognitive Robotics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667241324000168","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Signal processing in general, and speech emotion recognition in particular, have long been familiar Artificial Intelligence (AI) tasks. With the explosion of deep learning, CNN models are used more frequently, accompanied by the emergence of many signal transformations. However, these methods often require significant hardware and runtime. In an effort to address these issues, we analyze and learn from existing transformations, leading us to propose a new method: Fourier Hilbert Transformation (FHT). In general, this method applies the Hilbert curve to Fourier images. The resulting images are small and dense, which is a shape well-suited to the CNN architecture. Additionally, the better distribution of information on the image allows the filters to fully utilize their power. These points support the argument that FHT provides an optimal input for CNN. Experiments conducted on popular datasets yielded promising results. FHT saves a large amount of hardware usage and runtime while maintaining high performance, even offers greater stability compared to existing methods. This opens up opportunities for deploying signal processing tasks on real-time systems with limited hardware.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

傅里叶·希尔伯特：输入变换增强CNN模型的语音情感识别

一般来说，信号处理，特别是语音情感识别，一直是人们熟悉的人工智能（AI）任务。随着深度学习的爆炸式发展，CNN模型的使用越来越频繁，伴随着许多信号变换的出现。然而，这些方法通常需要大量的硬件和运行时。为了解决这些问题，我们分析并学习了现有的变换，从而提出了一种新的方法：傅里叶希尔伯特变换（FHT）。一般来说，这种方法将希尔伯特曲线应用于傅里叶图像。生成的图像小而密集，这是一种非常适合CNN架构的形状。此外，图像上信息的更好分布允许滤波器充分利用它们的功率。这些观点支持了FHT为CNN提供最佳输入的论点。在流行的数据集上进行的实验产生了令人鼓舞的结果。FHT在保持高性能的同时节省了大量的硬件使用和运行时间，甚至比现有方法提供了更高的稳定性。这为在硬件有限的实时系统上部署信号处理任务提供了机会。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Cognitive Robotics

CiteScore

8.40

自引率

0.00%

发文量