AVES: An Audio-Visual Emotion Stream Dataset for Temporal Emotion Detection

IF 9.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE IEEE Transactions on Affective Computing Pub Date : 2024-08-09 DOI:10.1109/TAFFC.2024.3440924

Yan Li;Wei Gan;Ke Lu;Dongmei Jiang;Ramesh Jain

{"title":"AVES: An Audio-Visual Emotion Stream Dataset for Temporal Emotion Detection","authors":"Yan Li;Wei Gan;Ke Lu;Dongmei Jiang;Ramesh Jain","doi":"10.1109/TAFFC.2024.3440924","DOIUrl":null,"url":null,"abstract":"Human emotions vary over time, which can be vividly described as a stream of emotions. Observing the emotion stream in daily life provides valuable insights into an individual's mental state. However, existing research in emotion understanding has mainly focused on classification tasks, assigning an emotion category to a well-trimmed segment or each frame within a continuous signal. In contrast, the task of temporal emotion detection, which involves <italic>locating</i> the boundaries of emotion segments and <italic>recognizing</i> their categories in untrimmed signals, has not been fully explored. To advance research in this area, this paper introduces an in-the-wild Audio-Visual Emotion Stream (AVES) dataset, which is reliably annotated with the time boundaries and emotion category for each emotion segment in the videos. Thus, AVES can serve as a solid benchmark for temporal emotion detection tasks. Moreover, considering the flexible boundaries and varying durations of emotion segments, we propose a Boundary Combination Network (BoCoNet) for temporal emotion detection, which leverages short-term temporal context information to first predict the boundaries of emotion segments and then locate the entire emotion segments. Extensive experiments conducted on various representative unimodal and multimodal representations demonstrate that BoCoNet achieves state-of-the-art results. The AVES dataset will be released to the research community. We expect that this paper can advance the research on emotion stream and temporal emotion detection.","PeriodicalId":13131,"journal":{"name":"IEEE Transactions on Affective Computing","volume":"16 1","pages":"438-450"},"PeriodicalIF":9.8000,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Affective Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10632787/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Human emotions vary over time, which can be vividly described as a stream of emotions. Observing the emotion stream in daily life provides valuable insights into an individual's mental state. However, existing research in emotion understanding has mainly focused on classification tasks, assigning an emotion category to a well-trimmed segment or each frame within a continuous signal. In contrast, the task of temporal emotion detection, which involves locating the boundaries of emotion segments and recognizing their categories in untrimmed signals, has not been fully explored. To advance research in this area, this paper introduces an in-the-wild Audio-Visual Emotion Stream (AVES) dataset, which is reliably annotated with the time boundaries and emotion category for each emotion segment in the videos. Thus, AVES can serve as a solid benchmark for temporal emotion detection tasks. Moreover, considering the flexible boundaries and varying durations of emotion segments, we propose a Boundary Combination Network (BoCoNet) for temporal emotion detection, which leverages short-term temporal context information to first predict the boundaries of emotion segments and then locate the entire emotion segments. Extensive experiments conducted on various representative unimodal and multimodal representations demonstrate that BoCoNet achieves state-of-the-art results. The AVES dataset will be released to the research community. We expect that this paper can advance the research on emotion stream and temporal emotion detection.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

AVES：用于时态情感检测的视听情感流数据集

人类的情绪随着时间的推移而变化，这可以生动地描述为一种情绪流。观察日常生活中的情绪流可以提供对个人精神状态的宝贵见解。然而，现有的情感理解研究主要集中在分类任务上，将情感类别分配给连续信号中的修剪良好的片段或每个帧。相比之下，时间情感检测的任务，包括定位情感片段的边界并在未修剪的信号中识别它们的类别，尚未得到充分的探索。为了推进这一领域的研究，本文引入了一个野外视听情绪流（AVES）数据集，该数据集对视频中的每个情绪片段进行了时间边界和情绪类别的可靠注释。因此，ave可以作为时间情绪检测任务的可靠基准。此外，考虑到情感片段边界的灵活性和持续时间的变化，我们提出了一种用于时间情感检测的边界组合网络（BoCoNet），该网络利用短期时间上下文信息首先预测情感片段的边界，然后定位整个情感片段。对各种代表性单模态和多模态表示进行的大量实验表明，BoCoNet达到了最先进的结果。AVES数据集将发布给研究界。期望本文能进一步推动情感流和时间情感检测的研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Affective Computing COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, CYBERNETICS

CiteScore

15.00

自引率

6.20%

发文量

174

期刊介绍： The IEEE Transactions on Affective Computing is an international and interdisciplinary journal. Its primary goal is to share research findings on the development of systems capable of recognizing, interpreting, and simulating human emotions and related affective phenomena. The journal publishes original research on the underlying principles and theories that explain how and why affective factors shape human-technology interactions. It also focuses on how techniques for sensing and simulating affect can enhance our understanding of human emotions and processes. Additionally, the journal explores the design, implementation, and evaluation of systems that prioritize the consideration of affect in their usability. We also welcome surveys of existing work that provide new perspectives on the historical and future directions of this field.