利用元音区域的音高和共振峰对情绪状态进行分类

Abhijit Mohanta, V. K. Mittal
{"title":"利用元音区域的音高和共振峰对情绪状态进行分类","authors":"Abhijit Mohanta, V. K. Mittal","doi":"10.1109/ICSPCOM.2016.7980624","DOIUrl":null,"url":null,"abstract":"In the field of Human Computer Interaction (HCI), human emotion recognition from speech signal is evolving as a recent research area. Speech is the most common way for communication among human beings. Speech consists of sentences, which can be further segregated into words. Words consist of phonemes which are considered to be the primary voice construction elements. This paper presents a classification of four basic emotional states, namely anger, happy, sad, and neutral by extracting acoustic features from the speech signal. Production features mainly F0, i.e., pitch and formants F1, F2, and F3 are derived from the speech signal using only the vowel parts of English language i.e., /a/, /e/, /i/, /o/, and /u/, without requiring to process the speech signal of entire utterances or sentences. Using the pitch and formants feature vectors, the emotion classification has been carried out using a Support Vector Machine (SVM) classifier. In this preliminary investigation, the vowel regions have been separated manually, so as to assess their efficacy in classifying the emotions. The approach has been validated using an emotional speech dataset in English language, collected especially for this study. The performance evaluation results obtained are encouraging. This approach can be further refined for wider applications.","PeriodicalId":213713,"journal":{"name":"2016 International Conference on Signal Processing and Communication (ICSC)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Classifying emotional states using pitch and formants in vowel regions\",\"authors\":\"Abhijit Mohanta, V. K. Mittal\",\"doi\":\"10.1109/ICSPCOM.2016.7980624\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the field of Human Computer Interaction (HCI), human emotion recognition from speech signal is evolving as a recent research area. Speech is the most common way for communication among human beings. Speech consists of sentences, which can be further segregated into words. Words consist of phonemes which are considered to be the primary voice construction elements. This paper presents a classification of four basic emotional states, namely anger, happy, sad, and neutral by extracting acoustic features from the speech signal. Production features mainly F0, i.e., pitch and formants F1, F2, and F3 are derived from the speech signal using only the vowel parts of English language i.e., /a/, /e/, /i/, /o/, and /u/, without requiring to process the speech signal of entire utterances or sentences. Using the pitch and formants feature vectors, the emotion classification has been carried out using a Support Vector Machine (SVM) classifier. In this preliminary investigation, the vowel regions have been separated manually, so as to assess their efficacy in classifying the emotions. The approach has been validated using an emotional speech dataset in English language, collected especially for this study. The performance evaluation results obtained are encouraging. This approach can be further refined for wider applications.\",\"PeriodicalId\":213713,\"journal\":{\"name\":\"2016 International Conference on Signal Processing and Communication (ICSC)\",\"volume\":\"55 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 International Conference on Signal Processing and Communication (ICSC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSPCOM.2016.7980624\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Conference on Signal Processing and Communication (ICSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSPCOM.2016.7980624","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

摘要

在人机交互(HCI)领域,从语音信号中识别人类情感是一个新兴的研究领域。语言是人类之间最常见的交流方式。言语是由句子组成的,而句子又可以进一步分解成单词。单词由音素组成,音素被认为是语音结构的主要元素。本文通过提取语音信号中的声学特征,对愤怒、快乐、悲伤和中性四种基本情绪状态进行了分类。产生特征主要为F0,即音高和共振峰F1、F2、F3,这些特征来源于仅使用英语元音部分/a/、/e/、/i/、/o/、/u/的语音信号,而不需要处理整个话语或句子的语音信号。利用音高和共振特征向量,利用支持向量机(SVM)分类器进行情感分类。在本初步研究中,我们人工分离了元音区域,以评估其在情绪分类中的功效。该方法已使用专门为本研究收集的英语情感语音数据集进行了验证。所获得的绩效评价结果令人鼓舞。这种方法可以进一步改进以用于更广泛的应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Classifying emotional states using pitch and formants in vowel regions
In the field of Human Computer Interaction (HCI), human emotion recognition from speech signal is evolving as a recent research area. Speech is the most common way for communication among human beings. Speech consists of sentences, which can be further segregated into words. Words consist of phonemes which are considered to be the primary voice construction elements. This paper presents a classification of four basic emotional states, namely anger, happy, sad, and neutral by extracting acoustic features from the speech signal. Production features mainly F0, i.e., pitch and formants F1, F2, and F3 are derived from the speech signal using only the vowel parts of English language i.e., /a/, /e/, /i/, /o/, and /u/, without requiring to process the speech signal of entire utterances or sentences. Using the pitch and formants feature vectors, the emotion classification has been carried out using a Support Vector Machine (SVM) classifier. In this preliminary investigation, the vowel regions have been separated manually, so as to assess their efficacy in classifying the emotions. The approach has been validated using an emotional speech dataset in English language, collected especially for this study. The performance evaluation results obtained are encouraging. This approach can be further refined for wider applications.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Analysis of neural electrical activities during elicitation of human emotion based on EEG PAPR reduction of coded OFDM system using low complexity PTS Two element MIMO antenna using Substrate Integrated Waveguide (SIW) horn Use of low frequency RFID to combat cattle rustling at Indian borders Power allocation and relay selection for wireless relay networks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1