{"title":"Salience based lexical features for emotion recognition","authors":"Kalani Wataraka Gamage, V. Sethu, E. Ambikairajah","doi":"10.1109/ICASSP.2017.7953274","DOIUrl":null,"url":null,"abstract":"In this paper we focus on the usefulness of verbal events for speech based emotion recognition. In particular, the use of phoneme sequences to encode verbal cues related to the expression of emotions is proposed and lexical features based on these phoneme sequences are introduced for use in automatic emotion recognition systems where manual transcripts are not available. Secondly, a novel estimate of emotional salience of verbal cues, applicable to both phoneme sequences and words, is presented. Experimental results on the IEMOCAP database show that the proposed automatic phoneme sequence based features can achieve an Unweighted Average Recall (UAR) of 49% with proposed salience measure. Further, the proposed salience measure can lead to an UAR of 64% when using manual word transcriptions. Both of these are the highest UARs reported on the IEMOCAP database for systems using lexical features extracted from automatic and manual transcripts respectively.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2017.7953274","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16
Abstract
In this paper we focus on the usefulness of verbal events for speech based emotion recognition. In particular, the use of phoneme sequences to encode verbal cues related to the expression of emotions is proposed and lexical features based on these phoneme sequences are introduced for use in automatic emotion recognition systems where manual transcripts are not available. Secondly, a novel estimate of emotional salience of verbal cues, applicable to both phoneme sequences and words, is presented. Experimental results on the IEMOCAP database show that the proposed automatic phoneme sequence based features can achieve an Unweighted Average Recall (UAR) of 49% with proposed salience measure. Further, the proposed salience measure can lead to an UAR of 64% when using manual word transcriptions. Both of these are the highest UARs reported on the IEMOCAP database for systems using lexical features extracted from automatic and manual transcripts respectively.