MMTrans-MT:一个使用多任务学习的多模态情绪识别框架

2021 13th International Conference on Advanced Computational Intelligence (ICACI) Pub Date : 2021-05-14 DOI:10.1109/ICACI52617.2021.9435906

Jinrui Shen, Jiahao Zheng, Xiaoping Wang

{"title":"MMTrans-MT:一个使用多任务学习的多模态情绪识别框架","authors":"Jinrui Shen, Jiahao Zheng, Xiaoping Wang","doi":"10.1109/ICACI52617.2021.9435906","DOIUrl":null,"url":null,"abstract":"With the development of deep learning, emotion recognition tasks are more inclined to use multimodal data and adequate supervised information to improve accuracy. In this work, MMTrans-MT (Multimodal Transformer-Multitask), the framework for multimodal emotion recognition using multitask learning is proposed. It has three modules: modalities representation module, multimodal fusion module, and multitask output module. Three modalities, i.e, words, audio and video, are comprehensively utilized to carry out emotion recognition by a simple but efficient fusion model based on Transformer. As for multitask learning, the two tasks are defined as categorical emotion classification and dimensional emotion regression. Considering a potential mapping relationship between two kinds of emotion model, multitask learning is adopted to make the two tasks promote each other and improve recognition accuracy. We conduct experiments on CMU-MOSEI and IEMOCAP datasets. Comprehensive experiments show that the accuracy of recognition using multimodal information is higher than that using unimodal information. Adopting multitask learning promotes the performance of emotion recognition.","PeriodicalId":382483,"journal":{"name":"2021 13th International Conference on Advanced Computational Intelligence (ICACI)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"MMTrans-MT: A Framework for Multimodal Emotion Recognition Using Multitask Learning\",\"authors\":\"Jinrui Shen, Jiahao Zheng, Xiaoping Wang\",\"doi\":\"10.1109/ICACI52617.2021.9435906\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the development of deep learning, emotion recognition tasks are more inclined to use multimodal data and adequate supervised information to improve accuracy. In this work, MMTrans-MT (Multimodal Transformer-Multitask), the framework for multimodal emotion recognition using multitask learning is proposed. It has three modules: modalities representation module, multimodal fusion module, and multitask output module. Three modalities, i.e, words, audio and video, are comprehensively utilized to carry out emotion recognition by a simple but efficient fusion model based on Transformer. As for multitask learning, the two tasks are defined as categorical emotion classification and dimensional emotion regression. Considering a potential mapping relationship between two kinds of emotion model, multitask learning is adopted to make the two tasks promote each other and improve recognition accuracy. We conduct experiments on CMU-MOSEI and IEMOCAP datasets. Comprehensive experiments show that the accuracy of recognition using multimodal information is higher than that using unimodal information. Adopting multitask learning promotes the performance of emotion recognition.\",\"PeriodicalId\":382483,\"journal\":{\"name\":\"2021 13th International Conference on Advanced Computational Intelligence (ICACI)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-05-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 13th International Conference on Advanced Computational Intelligence (ICACI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICACI52617.2021.9435906\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 13th International Conference on Advanced Computational Intelligence (ICACI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACI52617.2021.9435906","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

随着深度学习的发展，情绪识别任务更倾向于使用多模态数据和充分的监督信息来提高准确性。本文提出了基于多任务学习的多模态情感识别框架MMTrans-MT (Multimodal Transformer-Multitask)。它有三个模块:模态表示模块、多模态融合模块和多任务输出模块。通过基于Transformer的简单高效的融合模型，综合利用文字、音频和视频三种模式进行情感识别。对于多任务学习，将两种任务定义为分类情绪分类和维度情绪回归。考虑到两种情绪模型之间潜在的映射关系，采用多任务学习使两种任务相互促进，提高识别准确率。我们在CMU-MOSEI和IEMOCAP数据集上进行了实验。综合实验表明，使用多模态信息识别的准确率高于使用单模态信息识别的准确率。采用多任务学习可以促进情绪识别的表现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

MMTrans-MT: A Framework for Multimodal Emotion Recognition Using Multitask Learning

With the development of deep learning, emotion recognition tasks are more inclined to use multimodal data and adequate supervised information to improve accuracy. In this work, MMTrans-MT (Multimodal Transformer-Multitask), the framework for multimodal emotion recognition using multitask learning is proposed. It has three modules: modalities representation module, multimodal fusion module, and multitask output module. Three modalities, i.e, words, audio and video, are comprehensively utilized to carry out emotion recognition by a simple but efficient fusion model based on Transformer. As for multitask learning, the two tasks are defined as categorical emotion classification and dimensional emotion regression. Considering a potential mapping relationship between two kinds of emotion model, multitask learning is adopted to make the two tasks promote each other and improve recognition accuracy. We conduct experiments on CMU-MOSEI and IEMOCAP datasets. Comprehensive experiments show that the accuracy of recognition using multimodal information is higher than that using unimodal information. Adopting multitask learning promotes the performance of emotion recognition.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 13th International Conference on Advanced Computational Intelligence (ICACI)

自引率

0.00%

发文量