Cochleogram-Based Speech Emotion Recognition with the Cascade of Asymmetric Resonators with Fast-Acting Compression Using Time-Distributed Convolutional Long Short-Term Memory and Support Vector Machines.

IF 3.9 3区 医学 Q1 ENGINEERING, MULTIDISCIPLINARY Biomimetics Pub Date : 2025-03-10 DOI:10.3390/biomimetics10030167
Cevahir Parlak
{"title":"Cochleogram-Based Speech Emotion Recognition with the Cascade of Asymmetric Resonators with Fast-Acting Compression Using Time-Distributed Convolutional Long Short-Term Memory and Support Vector Machines.","authors":"Cevahir Parlak","doi":"10.3390/biomimetics10030167","DOIUrl":null,"url":null,"abstract":"<p><p>Feature extraction is a crucial stage in speech emotion recognition applications, and filter banks with their related statistical functions are widely used for this purpose. Although Mel filters and MFCCs achieve outstanding results, they do not perfectly model the structure of the human ear, as they use a simplified mechanism to simulate the functioning of human cochlear structures. The Mel filters system is not a perfect representation of human hearing, but merely an engineering shortcut to suppress the pitch and low-frequency components, which have little use in traditional speech recognition applications. However, speech emotion recognition classification is heavily related to pitch and low-frequency component features. The newly tailored CARFAC 24 model is a sophisticated system for analyzing human speech and is designed to best simulate the functionalities of the human cochlea. In this study, we use the CARFAC 24 system for speech emotion recognition and compare it with state-of-the-art systems using speaker-independent studies conducted with Time-Distributed Convolutional LSTM networks and Support Vector Machines, with the use of the ASED and the NEMO emotional speech dataset. The results demonstrate that CARFAC 24 is a valuable alternative to Mel and MFCC features in speech emotion recognition applications.</p>","PeriodicalId":8907,"journal":{"name":"Biomimetics","volume":"10 3","pages":""},"PeriodicalIF":3.9000,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11940085/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomimetics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.3390/biomimetics10030167","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Feature extraction is a crucial stage in speech emotion recognition applications, and filter banks with their related statistical functions are widely used for this purpose. Although Mel filters and MFCCs achieve outstanding results, they do not perfectly model the structure of the human ear, as they use a simplified mechanism to simulate the functioning of human cochlear structures. The Mel filters system is not a perfect representation of human hearing, but merely an engineering shortcut to suppress the pitch and low-frequency components, which have little use in traditional speech recognition applications. However, speech emotion recognition classification is heavily related to pitch and low-frequency component features. The newly tailored CARFAC 24 model is a sophisticated system for analyzing human speech and is designed to best simulate the functionalities of the human cochlea. In this study, we use the CARFAC 24 system for speech emotion recognition and compare it with state-of-the-art systems using speaker-independent studies conducted with Time-Distributed Convolutional LSTM networks and Support Vector Machines, with the use of the ASED and the NEMO emotional speech dataset. The results demonstrate that CARFAC 24 is a valuable alternative to Mel and MFCC features in speech emotion recognition applications.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于时间分布卷积长短期记忆和支持向量机的非对称共振级联快速动作压缩语音情感识别。
特征提取是语音情感识别应用中的一个关键阶段,滤波器组及其相关统计函数被广泛应用于这一目的。尽管 Mel 滤波器和 MFCC 取得了出色的效果,但它们并不能完美地模拟人耳的结构,因为它们使用了简化的机制来模拟人耳蜗结构的功能。梅尔滤波器系统并非人类听觉的完美代表,而只是抑制音高和低频成分的一种工程学捷径,在传统语音识别应用中用处不大。然而,语音情感识别分类与音高和低频成分特征密切相关。新定制的 CARFAC 24 模型是一个用于分析人类语音的复杂系统,其设计旨在最好地模拟人类耳蜗的功能。在本研究中,我们使用 CARFAC 24 系统进行语音情感识别,并通过使用时间分布卷积 LSTM 网络和支持向量机进行与说话者无关的研究,使用 ASED 和 NEMO 情感语音数据集将其与最先进的系统进行比较。研究结果表明,在语音情感识别应用中,CARFAC 24 是 Mel 和 MFCC 特征的重要替代品。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Biomimetics
Biomimetics Biochemistry, Genetics and Molecular Biology-Biotechnology
CiteScore
3.50
自引率
11.10%
发文量
189
审稿时长
11 weeks
期刊最新文献
CQLHBA: Node Coverage Optimization Using Chaotic Quantum-Inspired Leader Honey Badger Algorithm. Comparative Investigations on Hydrodynamic Performance of Active and Passive Tails of Undulating Swimmers. A Fluid Dynamics-Model System for Advancing Tissue Engineering and Cancer Research Studies: Biological Assessment of the Innovative BioAxFlow Dynamic Culture Bioreactor. Additively Manufactured Dragonfly-Inspired Wings for Bio-Faithful Flapping MAV Development. Editorial for Special Issue on Biomimetic Adaptive Buildings.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1