Multimodal Fusion Interactions: A Study of Human and Automatic Quantification

Companion Publication of the 2020 International Conference on Multimodal Interaction Pub Date : 2023-10-09 DOI:10.1145/3577190.3614151

Paul Pu Liang, Yun Cheng, Ruslan Salakhutdinov, Louis-Philippe Morency

{"title":"Multimodal Fusion Interactions: A Study of Human and Automatic Quantification","authors":"Paul Pu Liang, Yun Cheng, Ruslan Salakhutdinov, Louis-Philippe Morency","doi":"10.1145/3577190.3614151","DOIUrl":null,"url":null,"abstract":"In order to perform multimodal fusion of heterogeneous signals, we need to understand their interactions: how each modality individually provides information useful for a task and how this information changes in the presence of other modalities. In this paper, we perform a comparative study of how humans annotate two categorizations of multimodal interactions: (1) partial labels, where different annotators annotate the label given the first, second, and both modalities, and (2) counterfactual labels, where the same annotator annotates the label given the first modality before asking them to explicitly reason about how their answer changes when given the second. We further propose an alternative taxonomy based on (3) information decomposition, where annotators annotate the degrees of redundancy: the extent to which modalities individually and together give the same predictions, uniqueness: the extent to which one modality enables a prediction that the other does not, and synergy: the extent to which both modalities enable one to make a prediction that one would not otherwise make using individual modalities. Through experiments and annotations, we highlight several opportunities and limitations of each approach and propose a method to automatically convert annotations of partial and counterfactual labels to information decomposition, yielding an accurate and efficient method for quantifying multimodal interactions.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Companion Publication of the 2020 International Conference on Multimodal Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3577190.3614151","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In order to perform multimodal fusion of heterogeneous signals, we need to understand their interactions: how each modality individually provides information useful for a task and how this information changes in the presence of other modalities. In this paper, we perform a comparative study of how humans annotate two categorizations of multimodal interactions: (1) partial labels, where different annotators annotate the label given the first, second, and both modalities, and (2) counterfactual labels, where the same annotator annotates the label given the first modality before asking them to explicitly reason about how their answer changes when given the second. We further propose an alternative taxonomy based on (3) information decomposition, where annotators annotate the degrees of redundancy: the extent to which modalities individually and together give the same predictions, uniqueness: the extent to which one modality enables a prediction that the other does not, and synergy: the extent to which both modalities enable one to make a prediction that one would not otherwise make using individual modalities. Through experiments and annotations, we highlight several opportunities and limitations of each approach and propose a method to automatically convert annotations of partial and counterfactual labels to information decomposition, yielding an accurate and efficient method for quantifying multimodal interactions.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

多模态融合相互作用:人与自动量化研究

为了执行异构信号的多模态融合，我们需要了解它们之间的相互作用:每个模态如何单独提供对任务有用的信息，以及这些信息在其他模态存在时如何变化。在本文中，我们对人类如何注释多模态交互的两种分类进行了比较研究:(1)部分标签，其中不同的注释者注释给定第一、第二和两种模态的标签;(2)反事实标签，其中相同的注释者注释给定第一模态的标签，然后要求他们明确地解释他们的答案在给定第二模态时如何变化。我们进一步提出了一种基于(3)信息分解的替代分类法，其中注释者注释冗余度:模式单独或一起给出相同预测的程度，唯一性:一种模式能够预测另一种模式无法预测的程度，以及协同性:两种模式能够做出预测的程度，否则使用单个模式无法做出预测。通过实验和注释，我们强调了每种方法的一些机会和局限性，并提出了一种将部分和反事实标签的注释自动转换为信息分解的方法，从而产生了一种准确有效的多模态交互量化方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Companion Publication of the 2020 International Conference on Multimodal Interaction

自引率

0.00%

发文量