Fourier Hilbert: The input transformation to enhance CNN models for speech emotion recognition

Bao Long Ly
{"title":"Fourier Hilbert: The input transformation to enhance CNN models for speech emotion recognition","authors":"Bao Long Ly","doi":"10.1016/j.cogr.2024.11.002","DOIUrl":null,"url":null,"abstract":"<div><div>Signal processing in general, and speech emotion recognition in particular, have long been familiar Artificial Intelligence (AI) tasks. With the explosion of deep learning, CNN models are used more frequently, accompanied by the emergence of many signal transformations. However, these methods often require significant hardware and runtime. In an effort to address these issues, we analyze and learn from existing transformations, leading us to propose a new method: Fourier Hilbert Transformation (FHT). In general, this method applies the Hilbert curve to Fourier images. The resulting images are small and dense, which is a shape well-suited to the CNN architecture. Additionally, the better distribution of information on the image allows the filters to fully utilize their power. These points support the argument that FHT provides an optimal input for CNN. Experiments conducted on popular datasets yielded promising results. FHT saves a large amount of hardware usage and runtime while maintaining high performance, even offers greater stability compared to existing methods. This opens up opportunities for deploying signal processing tasks on real-time systems with limited hardware.</div></div>","PeriodicalId":100288,"journal":{"name":"Cognitive Robotics","volume":"4 ","pages":"Pages 228-236"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognitive Robotics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667241324000168","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Signal processing in general, and speech emotion recognition in particular, have long been familiar Artificial Intelligence (AI) tasks. With the explosion of deep learning, CNN models are used more frequently, accompanied by the emergence of many signal transformations. However, these methods often require significant hardware and runtime. In an effort to address these issues, we analyze and learn from existing transformations, leading us to propose a new method: Fourier Hilbert Transformation (FHT). In general, this method applies the Hilbert curve to Fourier images. The resulting images are small and dense, which is a shape well-suited to the CNN architecture. Additionally, the better distribution of information on the image allows the filters to fully utilize their power. These points support the argument that FHT provides an optimal input for CNN. Experiments conducted on popular datasets yielded promising results. FHT saves a large amount of hardware usage and runtime while maintaining high performance, even offers greater stability compared to existing methods. This opens up opportunities for deploying signal processing tasks on real-time systems with limited hardware.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
傅里叶·希尔伯特:输入变换增强CNN模型的语音情感识别
一般来说,信号处理,特别是语音情感识别,一直是人们熟悉的人工智能(AI)任务。随着深度学习的爆炸式发展,CNN模型的使用越来越频繁,伴随着许多信号变换的出现。然而,这些方法通常需要大量的硬件和运行时。为了解决这些问题,我们分析并学习了现有的变换,从而提出了一种新的方法:傅里叶希尔伯特变换(FHT)。一般来说,这种方法将希尔伯特曲线应用于傅里叶图像。生成的图像小而密集,这是一种非常适合CNN架构的形状。此外,图像上信息的更好分布允许滤波器充分利用它们的功率。这些观点支持了FHT为CNN提供最佳输入的论点。在流行的数据集上进行的实验产生了令人鼓舞的结果。FHT在保持高性能的同时节省了大量的硬件使用和运行时间,甚至比现有方法提供了更高的稳定性。这为在硬件有限的实时系统上部署信号处理任务提供了机会。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
8.40
自引率
0.00%
发文量
0
期刊最新文献
Optimizing Food Sample Handling and Placement Pattern Recognition with YOLO: Advanced Techniques in Robotic Object Detection Intelligent path planning for cognitive mobile robot based on Dhouib-Matrix-SPP method YOLOT: Multi-scale and diverse tire sidewall text region detection based on You-Only-Look-Once(YOLOv5) Scalable and cohesive swarm control based on reinforcement learning POMDP-based probabilistic decision making for path planning in wheeled mobile robot
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1