基于视频的显性情感预测的 Sec2Sec 协同关注

Mingwei Sun, Kunpeng Zhang
{"title":"基于视频的显性情感预测的 Sec2Sec 协同关注","authors":"Mingwei Sun, Kunpeng Zhang","doi":"arxiv-2408.15209","DOIUrl":null,"url":null,"abstract":"Video-based apparent affect detection plays a crucial role in video\nunderstanding, as it encompasses various elements such as vision, audio,\naudio-visual interactions, and spatiotemporal information, which are essential\nfor accurate video predictions. However, existing approaches often focus on\nextracting only a subset of these elements, resulting in the limited predictive\ncapacity of their models. To address this limitation, we propose a novel\nLSTM-based network augmented with a Transformer co-attention mechanism for\npredicting apparent affect in videos. We demonstrate that our proposed Sec2Sec\nCo-attention Transformer surpasses multiple state-of-the-art methods in\npredicting apparent affect on two widely used datasets: LIRIS-ACCEDE and First\nImpressions. Notably, our model offers interpretability, allowing us to examine\nthe contributions of different time points to the overall prediction. The\nimplementation is available at: https://github.com/nestor-sun/sec2sec.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"59 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Sec2Sec Co-attention for Video-Based Apparent Affective Prediction\",\"authors\":\"Mingwei Sun, Kunpeng Zhang\",\"doi\":\"arxiv-2408.15209\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Video-based apparent affect detection plays a crucial role in video\\nunderstanding, as it encompasses various elements such as vision, audio,\\naudio-visual interactions, and spatiotemporal information, which are essential\\nfor accurate video predictions. However, existing approaches often focus on\\nextracting only a subset of these elements, resulting in the limited predictive\\ncapacity of their models. To address this limitation, we propose a novel\\nLSTM-based network augmented with a Transformer co-attention mechanism for\\npredicting apparent affect in videos. We demonstrate that our proposed Sec2Sec\\nCo-attention Transformer surpasses multiple state-of-the-art methods in\\npredicting apparent affect on two widely used datasets: LIRIS-ACCEDE and First\\nImpressions. Notably, our model offers interpretability, allowing us to examine\\nthe contributions of different time points to the overall prediction. The\\nimplementation is available at: https://github.com/nestor-sun/sec2sec.\",\"PeriodicalId\":501480,\"journal\":{\"name\":\"arXiv - CS - Multimedia\",\"volume\":\"59 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Multimedia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.15209\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.15209","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

基于视频的表观情感检测在视频理解中起着至关重要的作用,因为它包含了视觉、音频、视听交互和时空信息等多种元素,这些元素对于准确的视频预测至关重要。然而,现有的方法往往只能提取这些元素的一个子集,导致其模型的预测能力有限。为了解决这一局限性,我们提出了一种基于 LSTM 的新型网络,并增加了 Transformer 共同关注机制,用于预测视频中的明显情感。在两个广泛使用的数据集上,我们证明了我们提出的 Sec2SecCo-attention Transformer 在预测表观情感方面超越了多种最先进的方法:LIRIS-ACCEDE 和 FirstImpressions。值得注意的是,我们的模型具有可解释性,允许我们检查不同时间点对整体预测的贡献。具体实施请访问:https://github.com/nestor-sun/sec2sec。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Sec2Sec Co-attention for Video-Based Apparent Affective Prediction
Video-based apparent affect detection plays a crucial role in video understanding, as it encompasses various elements such as vision, audio, audio-visual interactions, and spatiotemporal information, which are essential for accurate video predictions. However, existing approaches often focus on extracting only a subset of these elements, resulting in the limited predictive capacity of their models. To address this limitation, we propose a novel LSTM-based network augmented with a Transformer co-attention mechanism for predicting apparent affect in videos. We demonstrate that our proposed Sec2Sec Co-attention Transformer surpasses multiple state-of-the-art methods in predicting apparent affect on two widely used datasets: LIRIS-ACCEDE and First Impressions. Notably, our model offers interpretability, allowing us to examine the contributions of different time points to the overall prediction. The implementation is available at: https://github.com/nestor-sun/sec2sec.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Vista3D: Unravel the 3D Darkside of a Single Image MoRAG -- Multi-Fusion Retrieval Augmented Generation for Human Motion Efficient Low-Resolution Face Recognition via Bridge Distillation Enhancing Few-Shot Classification without Forgetting through Multi-Level Contrastive Constraints NVLM: Open Frontier-Class Multimodal LLMs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1