AVES: An Audio-Visual Emotion Stream Dataset for Temporal Emotion Detection

IF 9.8 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE IEEE Transactions on Affective Computing Pub Date : 2024-08-09 DOI:10.1109/TAFFC.2024.3440924
Yan Li;Wei Gan;Ke Lu;Dongmei Jiang;Ramesh Jain
{"title":"AVES: An Audio-Visual Emotion Stream Dataset for Temporal Emotion Detection","authors":"Yan Li;Wei Gan;Ke Lu;Dongmei Jiang;Ramesh Jain","doi":"10.1109/TAFFC.2024.3440924","DOIUrl":null,"url":null,"abstract":"Human emotions vary over time, which can be vividly described as a stream of emotions. Observing the emotion stream in daily life provides valuable insights into an individual's mental state. However, existing research in emotion understanding has mainly focused on classification tasks, assigning an emotion category to a well-trimmed segment or each frame within a continuous signal. In contrast, the task of temporal emotion detection, which involves <italic>locating</i> the boundaries of emotion segments and <italic>recognizing</i> their categories in untrimmed signals, has not been fully explored. To advance research in this area, this paper introduces an in-the-wild Audio-Visual Emotion Stream (AVES) dataset, which is reliably annotated with the time boundaries and emotion category for each emotion segment in the videos. Thus, AVES can serve as a solid benchmark for temporal emotion detection tasks. Moreover, considering the flexible boundaries and varying durations of emotion segments, we propose a Boundary Combination Network (BoCoNet) for temporal emotion detection, which leverages short-term temporal context information to first predict the boundaries of emotion segments and then locate the entire emotion segments. Extensive experiments conducted on various representative unimodal and multimodal representations demonstrate that BoCoNet achieves state-of-the-art results. The AVES dataset will be released to the research community. We expect that this paper can advance the research on emotion stream and temporal emotion detection.","PeriodicalId":13131,"journal":{"name":"IEEE Transactions on Affective Computing","volume":"16 1","pages":"438-450"},"PeriodicalIF":9.8000,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Affective Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10632787/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Human emotions vary over time, which can be vividly described as a stream of emotions. Observing the emotion stream in daily life provides valuable insights into an individual's mental state. However, existing research in emotion understanding has mainly focused on classification tasks, assigning an emotion category to a well-trimmed segment or each frame within a continuous signal. In contrast, the task of temporal emotion detection, which involves locating the boundaries of emotion segments and recognizing their categories in untrimmed signals, has not been fully explored. To advance research in this area, this paper introduces an in-the-wild Audio-Visual Emotion Stream (AVES) dataset, which is reliably annotated with the time boundaries and emotion category for each emotion segment in the videos. Thus, AVES can serve as a solid benchmark for temporal emotion detection tasks. Moreover, considering the flexible boundaries and varying durations of emotion segments, we propose a Boundary Combination Network (BoCoNet) for temporal emotion detection, which leverages short-term temporal context information to first predict the boundaries of emotion segments and then locate the entire emotion segments. Extensive experiments conducted on various representative unimodal and multimodal representations demonstrate that BoCoNet achieves state-of-the-art results. The AVES dataset will be released to the research community. We expect that this paper can advance the research on emotion stream and temporal emotion detection.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
AVES:用于时态情感检测的视听情感流数据集
人类的情绪随着时间的推移而变化,这可以生动地描述为一种情绪流。观察日常生活中的情绪流可以提供对个人精神状态的宝贵见解。然而,现有的情感理解研究主要集中在分类任务上,将情感类别分配给连续信号中的修剪良好的片段或每个帧。相比之下,时间情感检测的任务,包括定位情感片段的边界并在未修剪的信号中识别它们的类别,尚未得到充分的探索。为了推进这一领域的研究,本文引入了一个野外视听情绪流(AVES)数据集,该数据集对视频中的每个情绪片段进行了时间边界和情绪类别的可靠注释。因此,ave可以作为时间情绪检测任务的可靠基准。此外,考虑到情感片段边界的灵活性和持续时间的变化,我们提出了一种用于时间情感检测的边界组合网络(BoCoNet),该网络利用短期时间上下文信息首先预测情感片段的边界,然后定位整个情感片段。对各种代表性单模态和多模态表示进行的大量实验表明,BoCoNet达到了最先进的结果。AVES数据集将发布给研究界。期望本文能进一步推动情感流和时间情感检测的研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Affective Computing
IEEE Transactions on Affective Computing COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, CYBERNETICS
CiteScore
15.00
自引率
6.20%
发文量
174
期刊介绍: The IEEE Transactions on Affective Computing is an international and interdisciplinary journal. Its primary goal is to share research findings on the development of systems capable of recognizing, interpreting, and simulating human emotions and related affective phenomena. The journal publishes original research on the underlying principles and theories that explain how and why affective factors shape human-technology interactions. It also focuses on how techniques for sensing and simulating affect can enhance our understanding of human emotions and processes. Additionally, the journal explores the design, implementation, and evaluation of systems that prioritize the consideration of affect in their usability. We also welcome surveys of existing work that provide new perspectives on the historical and future directions of this field.
期刊最新文献
Label Distribution Learning for Facial Expression Recognition based on Multi-Granularity Perception DQ-Former: Layer-wise Querying Transformer with Dynamic Modality Priority for Conversational Multimodal Emotion Recognition Angle-Optimized Partial Disentanglement for Multimodal Emotion Recognition in Conversation Emotion Recognition from Physiological Responses: Accessibility and Usability of Publicly Available Datasets Cognition-guided Complex-valued Graph Convolutional Network for Gait Emotion Recognition
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1