Devamanyu Hazarika, Soujanya Poria, Amir Zadeh, Erik Cambria, Louis-Philippe Morency, Roger Zimmermann
{"title":"二元对话视频中情感识别的会话记忆网络。","authors":"Devamanyu Hazarika, Soujanya Poria, Amir Zadeh, Erik Cambria, Louis-Philippe Morency, Roger Zimmermann","doi":"10.18653/v1/n18-1193","DOIUrl":null,"url":null,"abstract":"<p><p>Emotion recognition in conversations is crucial for the development of empathetic machines. Present methods mostly ignore the role of inter-speaker dependency relations while classifying emotions in conversations. In this paper, we address recognizing utterance-level emotions in dyadic conversational videos. We propose a deep neural framework, termed conversational memory network, which leverages contextual information from the conversation history. The framework takes a multimodal approach comprising audio, visual and textual features with gated recurrent units to model past utterances of each speaker into memories. Such memories are then merged using attention-based hops to capture inter-speaker dependencies. Experiments show an accuracy improvement of 3-4% over the state of the art.</p>","PeriodicalId":74542,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting","volume":"2018 ","pages":"2122-2132"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.18653/v1/n18-1193","citationCount":"282","resultStr":"{\"title\":\"Conversational Memory Network for Emotion Recognition in Dyadic Dialogue Videos.\",\"authors\":\"Devamanyu Hazarika, Soujanya Poria, Amir Zadeh, Erik Cambria, Louis-Philippe Morency, Roger Zimmermann\",\"doi\":\"10.18653/v1/n18-1193\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Emotion recognition in conversations is crucial for the development of empathetic machines. Present methods mostly ignore the role of inter-speaker dependency relations while classifying emotions in conversations. In this paper, we address recognizing utterance-level emotions in dyadic conversational videos. We propose a deep neural framework, termed conversational memory network, which leverages contextual information from the conversation history. The framework takes a multimodal approach comprising audio, visual and textual features with gated recurrent units to model past utterances of each speaker into memories. Such memories are then merged using attention-based hops to capture inter-speaker dependencies. Experiments show an accuracy improvement of 3-4% over the state of the art.</p>\",\"PeriodicalId\":74542,\"journal\":{\"name\":\"Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting\",\"volume\":\"2018 \",\"pages\":\"2122-2132\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.18653/v1/n18-1193\",\"citationCount\":\"282\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18653/v1/n18-1193\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/n18-1193","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Conversational Memory Network for Emotion Recognition in Dyadic Dialogue Videos.
Emotion recognition in conversations is crucial for the development of empathetic machines. Present methods mostly ignore the role of inter-speaker dependency relations while classifying emotions in conversations. In this paper, we address recognizing utterance-level emotions in dyadic conversational videos. We propose a deep neural framework, termed conversational memory network, which leverages contextual information from the conversation history. The framework takes a multimodal approach comprising audio, visual and textual features with gated recurrent units to model past utterances of each speaker into memories. Such memories are then merged using attention-based hops to capture inter-speaker dependencies. Experiments show an accuracy improvement of 3-4% over the state of the art.