{"title":"具有回顾损失的常识知识增强网络用于口语对话中的情绪识别","authors":"Yunhe Xie, Chengjie Sun, Zhenzhou Ji","doi":"10.1109/icassp43922.2022.9746909","DOIUrl":null,"url":null,"abstract":"The recent surges in the open conversational data caused Emotion Recognition in Spoken Dialog (ERSD) to gain much attention. However, the existing ERSD datasets’ scale limits the model’s complete reasoning. Moreover, the artificial dialogue agent is ideally able to reference past dialogue experiences. This paper proposes a Commonsense Knowledge Enhanced Network with a retrospective loss, namely CKE-Net, to hierarchically perform dialog modeling, external knowledge integration, and historical state retrospect. Specifically, we first adopt a transformer-based encoder to model context in multi-view by elaborating different mask matrices. Then, the graph attention network is used to introduce commonsense knowledge, which benefits the complex emotional reasoning. Finally, a retrospective loss is added to utilize the model’s prior experience during training. Experiments on IEMOCAP and MELD datasets demonstrate that every designed module is consistently beneficial to the performance. Extensive experimental results show that our model outperforms the state-of-the-art models across the two benchmark datasets.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"A Commonsense Knowledge Enhanced Network with Retrospective Loss for Emotion Recognition in Spoken Dialog\",\"authors\":\"Yunhe Xie, Chengjie Sun, Zhenzhou Ji\",\"doi\":\"10.1109/icassp43922.2022.9746909\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The recent surges in the open conversational data caused Emotion Recognition in Spoken Dialog (ERSD) to gain much attention. However, the existing ERSD datasets’ scale limits the model’s complete reasoning. Moreover, the artificial dialogue agent is ideally able to reference past dialogue experiences. This paper proposes a Commonsense Knowledge Enhanced Network with a retrospective loss, namely CKE-Net, to hierarchically perform dialog modeling, external knowledge integration, and historical state retrospect. Specifically, we first adopt a transformer-based encoder to model context in multi-view by elaborating different mask matrices. Then, the graph attention network is used to introduce commonsense knowledge, which benefits the complex emotional reasoning. Finally, a retrospective loss is added to utilize the model’s prior experience during training. Experiments on IEMOCAP and MELD datasets demonstrate that every designed module is consistently beneficial to the performance. Extensive experimental results show that our model outperforms the state-of-the-art models across the two benchmark datasets.\",\"PeriodicalId\":272439,\"journal\":{\"name\":\"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/icassp43922.2022.9746909\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/icassp43922.2022.9746909","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Commonsense Knowledge Enhanced Network with Retrospective Loss for Emotion Recognition in Spoken Dialog
The recent surges in the open conversational data caused Emotion Recognition in Spoken Dialog (ERSD) to gain much attention. However, the existing ERSD datasets’ scale limits the model’s complete reasoning. Moreover, the artificial dialogue agent is ideally able to reference past dialogue experiences. This paper proposes a Commonsense Knowledge Enhanced Network with a retrospective loss, namely CKE-Net, to hierarchically perform dialog modeling, external knowledge integration, and historical state retrospect. Specifically, we first adopt a transformer-based encoder to model context in multi-view by elaborating different mask matrices. Then, the graph attention network is used to introduce commonsense knowledge, which benefits the complex emotional reasoning. Finally, a retrospective loss is added to utilize the model’s prior experience during training. Experiments on IEMOCAP and MELD datasets demonstrate that every designed module is consistently beneficial to the performance. Extensive experimental results show that our model outperforms the state-of-the-art models across the two benchmark datasets.