MMTrans-MT:一个使用多任务学习的多模态情绪识别框架

Jinrui Shen, Jiahao Zheng, Xiaoping Wang
{"title":"MMTrans-MT:一个使用多任务学习的多模态情绪识别框架","authors":"Jinrui Shen, Jiahao Zheng, Xiaoping Wang","doi":"10.1109/ICACI52617.2021.9435906","DOIUrl":null,"url":null,"abstract":"With the development of deep learning, emotion recognition tasks are more inclined to use multimodal data and adequate supervised information to improve accuracy. In this work, MMTrans-MT (Multimodal Transformer-Multitask), the framework for multimodal emotion recognition using multitask learning is proposed. It has three modules: modalities representation module, multimodal fusion module, and multitask output module. Three modalities, i.e, words, audio and video, are comprehensively utilized to carry out emotion recognition by a simple but efficient fusion model based on Transformer. As for multitask learning, the two tasks are defined as categorical emotion classification and dimensional emotion regression. Considering a potential mapping relationship between two kinds of emotion model, multitask learning is adopted to make the two tasks promote each other and improve recognition accuracy. We conduct experiments on CMU-MOSEI and IEMOCAP datasets. Comprehensive experiments show that the accuracy of recognition using multimodal information is higher than that using unimodal information. Adopting multitask learning promotes the performance of emotion recognition.","PeriodicalId":382483,"journal":{"name":"2021 13th International Conference on Advanced Computational Intelligence (ICACI)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"MMTrans-MT: A Framework for Multimodal Emotion Recognition Using Multitask Learning\",\"authors\":\"Jinrui Shen, Jiahao Zheng, Xiaoping Wang\",\"doi\":\"10.1109/ICACI52617.2021.9435906\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the development of deep learning, emotion recognition tasks are more inclined to use multimodal data and adequate supervised information to improve accuracy. In this work, MMTrans-MT (Multimodal Transformer-Multitask), the framework for multimodal emotion recognition using multitask learning is proposed. It has three modules: modalities representation module, multimodal fusion module, and multitask output module. Three modalities, i.e, words, audio and video, are comprehensively utilized to carry out emotion recognition by a simple but efficient fusion model based on Transformer. As for multitask learning, the two tasks are defined as categorical emotion classification and dimensional emotion regression. Considering a potential mapping relationship between two kinds of emotion model, multitask learning is adopted to make the two tasks promote each other and improve recognition accuracy. We conduct experiments on CMU-MOSEI and IEMOCAP datasets. Comprehensive experiments show that the accuracy of recognition using multimodal information is higher than that using unimodal information. Adopting multitask learning promotes the performance of emotion recognition.\",\"PeriodicalId\":382483,\"journal\":{\"name\":\"2021 13th International Conference on Advanced Computational Intelligence (ICACI)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-05-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 13th International Conference on Advanced Computational Intelligence (ICACI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICACI52617.2021.9435906\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 13th International Conference on Advanced Computational Intelligence (ICACI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACI52617.2021.9435906","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

随着深度学习的发展,情绪识别任务更倾向于使用多模态数据和充分的监督信息来提高准确性。本文提出了基于多任务学习的多模态情感识别框架MMTrans-MT (Multimodal Transformer-Multitask)。它有三个模块:模态表示模块、多模态融合模块和多任务输出模块。通过基于Transformer的简单高效的融合模型,综合利用文字、音频和视频三种模式进行情感识别。对于多任务学习,将两种任务定义为分类情绪分类和维度情绪回归。考虑到两种情绪模型之间潜在的映射关系,采用多任务学习使两种任务相互促进,提高识别准确率。我们在CMU-MOSEI和IEMOCAP数据集上进行了实验。综合实验表明,使用多模态信息识别的准确率高于使用单模态信息识别的准确率。采用多任务学习可以促进情绪识别的表现。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
MMTrans-MT: A Framework for Multimodal Emotion Recognition Using Multitask Learning
With the development of deep learning, emotion recognition tasks are more inclined to use multimodal data and adequate supervised information to improve accuracy. In this work, MMTrans-MT (Multimodal Transformer-Multitask), the framework for multimodal emotion recognition using multitask learning is proposed. It has three modules: modalities representation module, multimodal fusion module, and multitask output module. Three modalities, i.e, words, audio and video, are comprehensively utilized to carry out emotion recognition by a simple but efficient fusion model based on Transformer. As for multitask learning, the two tasks are defined as categorical emotion classification and dimensional emotion regression. Considering a potential mapping relationship between two kinds of emotion model, multitask learning is adopted to make the two tasks promote each other and improve recognition accuracy. We conduct experiments on CMU-MOSEI and IEMOCAP datasets. Comprehensive experiments show that the accuracy of recognition using multimodal information is higher than that using unimodal information. Adopting multitask learning promotes the performance of emotion recognition.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Visual saliency detection based on visual center shift MMTrans-MT: A Framework for Multimodal Emotion Recognition Using Multitask Learning K-means Clustering Based on Improved Quantum Particle Swarm Optimization Algorithm Performance of different Electric vehicle Battery packs at low temperature and Analysis of Intelligent SOC experiment Service Quality Loss-aware Privacy Protection Mechanism in Edge-Cloud IoTs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1