Using the Bag-of-Audio-Words approach for emotion recognition

IF 0.3 Q4 COMPUTER SCIENCE, THEORY & METHODS Acta Universitatis Sapientiae Informatica Pub Date : 2022-08-01 DOI:10.2478/ausi-2022-0001
Mercedes Vetráb, G. Gosztolya
{"title":"Using the Bag-of-Audio-Words approach for emotion recognition","authors":"Mercedes Vetráb, G. Gosztolya","doi":"10.2478/ausi-2022-0001","DOIUrl":null,"url":null,"abstract":"Abstract The problem of varying length recordings is a well-known issue in paralinguistics. We investigated how to resolve this problem using the bag-of-audio-words feature extraction approach. The steps of this technique involve preprocessing, clustering, quantization and normalization. The bag-of-audio-words technique is competitive in the area of speech emotion recognition, but the method has several parameters that need to be precisely tuned for good efficiency. The main aim of our study was to analyse the effectiveness of bag-of-audio-words method and try to find the best parameter values for emotion recognition. We optimized the parameters one-by-one, but built on the results of each other. We performed the feature extraction, using openSMILE. Next we transformed our features into same-sized vectors with openXBOW, and finally trained and evaluated SVM models with 10-fold-crossvalidation and UAR. In our experiments, we worked with a Hungarian emotion database. According to our results, the emotion classification performance improves with the bag-of-audio-words feature representation. Not all the BoAW parameters have the optimal settings but later we can make clear recommendations on how to set bag-of-audio-words parameters for emotion detection tasks.","PeriodicalId":41480,"journal":{"name":"Acta Universitatis Sapientiae Informatica","volume":"1 1","pages":"1 - 21"},"PeriodicalIF":0.3000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Acta Universitatis Sapientiae Informatica","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/ausi-2022-0001","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Abstract The problem of varying length recordings is a well-known issue in paralinguistics. We investigated how to resolve this problem using the bag-of-audio-words feature extraction approach. The steps of this technique involve preprocessing, clustering, quantization and normalization. The bag-of-audio-words technique is competitive in the area of speech emotion recognition, but the method has several parameters that need to be precisely tuned for good efficiency. The main aim of our study was to analyse the effectiveness of bag-of-audio-words method and try to find the best parameter values for emotion recognition. We optimized the parameters one-by-one, but built on the results of each other. We performed the feature extraction, using openSMILE. Next we transformed our features into same-sized vectors with openXBOW, and finally trained and evaluated SVM models with 10-fold-crossvalidation and UAR. In our experiments, we worked with a Hungarian emotion database. According to our results, the emotion classification performance improves with the bag-of-audio-words feature representation. Not all the BoAW parameters have the optimal settings but later we can make clear recommendations on how to set bag-of-audio-words parameters for emotion detection tasks.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用音频词袋方法进行情绪识别
变长记录问题是副语言学中一个众所周知的问题。我们研究了如何使用音频词袋特征提取方法来解决这个问题。该技术的步骤包括预处理、聚类、量化和归一化。音频词袋技术在语音情感识别领域具有竞争力,但该方法有几个参数需要精确调整才能获得良好的效率。本研究的主要目的是分析音频词袋方法的有效性,并试图找到情感识别的最佳参数值。我们逐个优化参数,但都是建立在彼此的结果之上。我们使用openSMILE进行特征提取。接下来,我们使用openXBOW将我们的特征转换成相同大小的向量,最后使用10倍交叉验证和UAR训练和评估SVM模型。在我们的实验中,我们使用了匈牙利情绪数据库。根据我们的研究结果,音频词袋特征表示提高了情感分类性能。并不是所有的BoAW参数都有最佳的设置,但是以后我们可以对如何设置bag-of-audio-words参数进行情绪检测任务给出明确的建议。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Acta Universitatis Sapientiae Informatica
Acta Universitatis Sapientiae Informatica COMPUTER SCIENCE, THEORY & METHODS-
自引率
0.00%
发文量
9
期刊最新文献
E-super arithmetic graceful labelling of Hi(m, m), Hi(1) (m, m) and chain of even cycles On agglomeration-based rupture degree in networks and a heuristic algorithm On domination in signed graphs Connected certified domination edge critical and stable graphs Eccentric connectivity index in transformation graph Gxy+
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1