{"title":"Fourier Hilbert: The input transformation to enhance CNN models for speech emotion recognition","authors":"Bao Long Ly","doi":"10.1016/j.cogr.2024.11.002","DOIUrl":null,"url":null,"abstract":"<div><div>Signal processing in general, and speech emotion recognition in particular, have long been familiar Artificial Intelligence (AI) tasks. With the explosion of deep learning, CNN models are used more frequently, accompanied by the emergence of many signal transformations. However, these methods often require significant hardware and runtime. In an effort to address these issues, we analyze and learn from existing transformations, leading us to propose a new method: Fourier Hilbert Transformation (FHT). In general, this method applies the Hilbert curve to Fourier images. The resulting images are small and dense, which is a shape well-suited to the CNN architecture. Additionally, the better distribution of information on the image allows the filters to fully utilize their power. These points support the argument that FHT provides an optimal input for CNN. Experiments conducted on popular datasets yielded promising results. FHT saves a large amount of hardware usage and runtime while maintaining high performance, even offers greater stability compared to existing methods. This opens up opportunities for deploying signal processing tasks on real-time systems with limited hardware.</div></div>","PeriodicalId":100288,"journal":{"name":"Cognitive Robotics","volume":"4 ","pages":"Pages 228-236"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognitive Robotics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667241324000168","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Signal processing in general, and speech emotion recognition in particular, have long been familiar Artificial Intelligence (AI) tasks. With the explosion of deep learning, CNN models are used more frequently, accompanied by the emergence of many signal transformations. However, these methods often require significant hardware and runtime. In an effort to address these issues, we analyze and learn from existing transformations, leading us to propose a new method: Fourier Hilbert Transformation (FHT). In general, this method applies the Hilbert curve to Fourier images. The resulting images are small and dense, which is a shape well-suited to the CNN architecture. Additionally, the better distribution of information on the image allows the filters to fully utilize their power. These points support the argument that FHT provides an optimal input for CNN. Experiments conducted on popular datasets yielded promising results. FHT saves a large amount of hardware usage and runtime while maintaining high performance, even offers greater stability compared to existing methods. This opens up opportunities for deploying signal processing tasks on real-time systems with limited hardware.