MultiMediate'24: Multi-Domain Engagement Estimation

arXiv - CS - Multimedia Pub Date : 2024-08-29 DOI:arxiv-2408.16625

Philipp Müller, Michal Balazia, Tobias Baur, Michael Dietz, Alexander Heimerl, Anna Penzkofer, Dominik Schiller, François Brémond, Jan Alexandersson, Elisabeth André, Andreas Bulling

{"title":"MultiMediate'24: Multi-Domain Engagement Estimation","authors":"Philipp Müller, Michal Balazia, Tobias Baur, Michael Dietz, Alexander Heimerl, Anna Penzkofer, Dominik Schiller, François Brémond, Jan Alexandersson, Elisabeth André, Andreas Bulling","doi":"arxiv-2408.16625","DOIUrl":null,"url":null,"abstract":"Estimating the momentary level of participant's engagement is an important\nprerequisite for assistive systems that support human interactions. Previous\nwork has addressed this task in within-domain evaluation scenarios, i.e.\ntraining and testing on the same dataset. This is in contrast to real-life\nscenarios where domain shifts between training and testing data frequently\noccur. With MultiMediate'24, we present the first challenge addressing\nmulti-domain engagement estimation. As training data, we utilise the NOXI\ndatabase of dyadic novice-expert interactions. In addition to within-domain\ntest data, we add two new test domains. First, we introduce recordings\nfollowing the NOXI protocol but covering languages that are not present in the\nNOXI training data. Second, we collected novel engagement annotations on the\nMPIIGroupInteraction dataset which consists of group discussions between three\nto four people. In this way, MultiMediate'24 evaluates the ability of\napproaches to generalise across factors such as language and cultural\nbackground, group size, task, and screen-mediated vs. face-to-face interaction.\nThis paper describes the MultiMediate'24 challenge and presents baseline\nresults. In addition, we discuss selected challenge solutions.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"9 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.16625","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Estimating the momentary level of participant's engagement is an important prerequisite for assistive systems that support human interactions. Previous work has addressed this task in within-domain evaluation scenarios, i.e. training and testing on the same dataset. This is in contrast to real-life scenarios where domain shifts between training and testing data frequently occur. With MultiMediate'24, we present the first challenge addressing multi-domain engagement estimation. As training data, we utilise the NOXI database of dyadic novice-expert interactions. In addition to within-domain test data, we add two new test domains. First, we introduce recordings following the NOXI protocol but covering languages that are not present in the NOXI training data. Second, we collected novel engagement annotations on the MPIIGroupInteraction dataset which consists of group discussions between three to four people. In this way, MultiMediate'24 evaluates the ability of approaches to generalise across factors such as language and cultural background, group size, task, and screen-mediated vs. face-to-face interaction. This paper describes the MultiMediate'24 challenge and presents baseline results. In addition, we discuss selected challenge solutions.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

MultiMediate'24：多领域参与度评估

估计参与者的瞬间参与程度是支持人类交互的辅助系统的一个重要前提。以前的工作是在领域内评估场景中完成这项任务，即在同一数据集上进行训练和测试。这与现实生活中经常出现训练数据和测试数据之间的领域转换形成了鲜明对比。通过 MultiMediate'24，我们首次提出了针对多领域参与度估计的挑战。作为训练数据，我们使用了新手与专家互动的 NOXIdatabase。除了内部测试数据外，我们还增加了两个新的测试域。首先，我们引入了遵循 NOXI 协议的录音，但涵盖了 NOXI 培训数据中没有的语言。其次，我们在 MPIIGroupInteraction 数据集上收集了新的参与注释，该数据集由三到四人的小组讨论组成。通过这种方式，MultiMediate'24 评估了各种方法跨越语言和文化背景、小组规模、任务以及屏幕媒介与面对面交互等因素的通用能力。此外，我们还讨论了选定的挑战解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - CS - Multimedia

自引率

0.00%

发文量

期刊最新文献

Vista3D: Unravel the 3D Darkside of a Single Image MoRAG -- Multi-Fusion Retrieval Augmented Generation for Human Motion Efficient Low-Resolution Face Recognition via Bridge Distillation Enhancing Few-Shot Classification without Forgetting through Multi-Level Contrastive Constraints NVLM: Open Frontier-Class Multimodal LLMs