{"title":"Classification of Speech Emotion State Based on Feature Map Fusion of TCN and Pretrained CNN Model From Korean Speech Emotion Data","authors":"A-Hyeon Jo;Keun-Chang Kwak","doi":"10.1109/ACCESS.2025.3534176","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a method for designing a classification model of speech emotional state based on the feature-map fusion of temporal convolutional network (TCN) and the pretrained convolutional neural networks (CNN) from Korean speech database. For this purpose, the proposed approach is comprised of four main stages. In the first stage, we extract Mel-frequency cepstral coefficient (MFCC) and gammatone cepstral coefficient features (GFCC) in the frequency domain as well as log-Mel spectrogram in the time-frequency domain. From these features, the second stage performs training process using TCN and the yet another audio Mobile Net network (YAMNet), respectively. In the third stage, we perform feature-map fusion using canonical correlation analysis (CCA), stationary wavelet transform (SWT), and fuzzy c-means-based principal component averaging (FCMPCA), respectively. From these steps, speech emotion recognition model is effectively designed through the fusion model of TCN and YAMNet as well as feature-map fusion methods. Finally, we evaluate the performance comparison from five databases: the AI-Hub speech emotion dataset built in Korea and Korean speech emotional state classification dataset built from Chosun University as well as Emo-DB, RAVDESS, and TESS datasets. The experimental results showed that the proposed model revealed good performance in comparison to other previous works in most datasets.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"13 ","pages":"19947-19963"},"PeriodicalIF":3.4000,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10854478","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Access","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10854478/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, we propose a method for designing a classification model of speech emotional state based on the feature-map fusion of temporal convolutional network (TCN) and the pretrained convolutional neural networks (CNN) from Korean speech database. For this purpose, the proposed approach is comprised of four main stages. In the first stage, we extract Mel-frequency cepstral coefficient (MFCC) and gammatone cepstral coefficient features (GFCC) in the frequency domain as well as log-Mel spectrogram in the time-frequency domain. From these features, the second stage performs training process using TCN and the yet another audio Mobile Net network (YAMNet), respectively. In the third stage, we perform feature-map fusion using canonical correlation analysis (CCA), stationary wavelet transform (SWT), and fuzzy c-means-based principal component averaging (FCMPCA), respectively. From these steps, speech emotion recognition model is effectively designed through the fusion model of TCN and YAMNet as well as feature-map fusion methods. Finally, we evaluate the performance comparison from five databases: the AI-Hub speech emotion dataset built in Korea and Korean speech emotional state classification dataset built from Chosun University as well as Emo-DB, RAVDESS, and TESS datasets. The experimental results showed that the proposed model revealed good performance in comparison to other previous works in most datasets.
IEEE AccessCOMPUTER SCIENCE, INFORMATION SYSTEMSENGIN-ENGINEERING, ELECTRICAL & ELECTRONIC
CiteScore
9.80
自引率
7.70%
发文量
6673
审稿时长
6 weeks
期刊介绍:
IEEE Access® is a multidisciplinary, open access (OA), applications-oriented, all-electronic archival journal that continuously presents the results of original research or development across all of IEEE''s fields of interest.
IEEE Access will publish articles that are of high interest to readers, original, technically correct, and clearly presented. Supported by author publication charges (APC), its hallmarks are a rapid peer review and publication process with open access to all readers. Unlike IEEE''s traditional Transactions or Journals, reviews are "binary", in that reviewers will either Accept or Reject an article in the form it is submitted in order to achieve rapid turnaround. Especially encouraged are submissions on:
Multidisciplinary topics, or applications-oriented articles and negative results that do not fit within the scope of IEEE''s traditional journals.
Practical articles discussing new experiments or measurement techniques, interesting solutions to engineering.
Development of new or improved fabrication or manufacturing techniques.
Reviews or survey articles of new or evolving fields oriented to assist others in understanding the new area.