Classification of Speech Emotion State Based on Feature Map Fusion of TCN and Pretrained CNN Model From Korean Speech Emotion Data

IF 3.4 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS IEEE Access Pub Date : 2025-01-27 DOI:10.1109/ACCESS.2025.3534176

A-Hyeon Jo;Keun-Chang Kwak

{"title":"Classification of Speech Emotion State Based on Feature Map Fusion of TCN and Pretrained CNN Model From Korean Speech Emotion Data","authors":"A-Hyeon Jo;Keun-Chang Kwak","doi":"10.1109/ACCESS.2025.3534176","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a method for designing a classification model of speech emotional state based on the feature-map fusion of temporal convolutional network (TCN) and the pretrained convolutional neural networks (CNN) from Korean speech database. For this purpose, the proposed approach is comprised of four main stages. In the first stage, we extract Mel-frequency cepstral coefficient (MFCC) and gammatone cepstral coefficient features (GFCC) in the frequency domain as well as log-Mel spectrogram in the time-frequency domain. From these features, the second stage performs training process using TCN and the yet another audio Mobile Net network (YAMNet), respectively. In the third stage, we perform feature-map fusion using canonical correlation analysis (CCA), stationary wavelet transform (SWT), and fuzzy c-means-based principal component averaging (FCMPCA), respectively. From these steps, speech emotion recognition model is effectively designed through the fusion model of TCN and YAMNet as well as feature-map fusion methods. Finally, we evaluate the performance comparison from five databases: the AI-Hub speech emotion dataset built in Korea and Korean speech emotional state classification dataset built from Chosun University as well as Emo-DB, RAVDESS, and TESS datasets. The experimental results showed that the proposed model revealed good performance in comparison to other previous works in most datasets.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"13 ","pages":"19947-19963"},"PeriodicalIF":3.4000,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10854478","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Access","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10854478/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

In this paper, we propose a method for designing a classification model of speech emotional state based on the feature-map fusion of temporal convolutional network (TCN) and the pretrained convolutional neural networks (CNN) from Korean speech database. For this purpose, the proposed approach is comprised of four main stages. In the first stage, we extract Mel-frequency cepstral coefficient (MFCC) and gammatone cepstral coefficient features (GFCC) in the frequency domain as well as log-Mel spectrogram in the time-frequency domain. From these features, the second stage performs training process using TCN and the yet another audio Mobile Net network (YAMNet), respectively. In the third stage, we perform feature-map fusion using canonical correlation analysis (CCA), stationary wavelet transform (SWT), and fuzzy c-means-based principal component averaging (FCMPCA), respectively. From these steps, speech emotion recognition model is effectively designed through the fusion model of TCN and YAMNet as well as feature-map fusion methods. Finally, we evaluate the performance comparison from five databases: the AI-Hub speech emotion dataset built in Korea and Korean speech emotional state classification dataset built from Chosun University as well as Emo-DB, RAVDESS, and TESS datasets. The experimental results showed that the proposed model revealed good performance in comparison to other previous works in most datasets.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Access COMPUTER SCIENCE, INFORMATION SYSTEMSENGIN-ENGINEERING, ELECTRICAL & ELECTRONIC

CiteScore

9.80

自引率

7.70%

发文量

6673

审稿时长

6 weeks

期刊介绍： IEEE Access® is a multidisciplinary, open access (OA), applications-oriented, all-electronic archival journal that continuously presents the results of original research or development across all of IEEE''s fields of interest. IEEE Access will publish articles that are of high interest to readers, original, technically correct, and clearly presented. Supported by author publication charges (APC), its hallmarks are a rapid peer review and publication process with open access to all readers. Unlike IEEE''s traditional Transactions or Journals, reviews are "binary", in that reviewers will either Accept or Reject an article in the form it is submitted in order to achieve rapid turnaround. Especially encouraged are submissions on: Multidisciplinary topics, or applications-oriented articles and negative results that do not fit within the scope of IEEE''s traditional journals. Practical articles discussing new experiments or measurement techniques, interesting solutions to engineering. Development of new or improved fabrication or manufacturing techniques. Reviews or survey articles of new or evolving fields oriented to assist others in understanding the new area.