首页 > 最新文献

Companion Publication of the 2020 International Conference on Multimodal Interaction最新文献

英文 中文
Identifying Interlocutors' Behaviors and its Timings Involved with Impression Formation from Head-Movement Features and Linguistic Features 从头部运动特征和语言特征识别对话者的行为及其时间与印象形成有关
Shumpei Otsuchi, Koya Ito, Yoko Ishii, Ryo Ishii, Shinichirou Eitoku, Kazuhiro Otsuka
A prediction-explanation framework is proposed to identify when and what behaviors are involved in forming interlocutors’ impressions in group discussions. We targeted the self-reported scores of 16 impressions, including enjoyment and concentration. To that end, we formulate the problem as discovering behavioral features that contributed to the impression prediction and determining the timings that the behaviors frequently occurred. To solve this problem, this paper proposes a two-fold framework consisting of the prediction part followed by the explanation part. The former prediction part employs random forest regressors using functional head-movement features and BERT-based linguistic features, which can capture various aspects of interactive conversational behaviors. The later part measures the levels of features’ contribution to the prediction using a SHAP analysis and introduces a novel idea of temporal decomposition of features’ contributions over time. The influential behaviors and their timings are identified from local maximums of the temporal distribution of features’ contributions. Targeting 17-group 4-female discussions, the predictability and explainability of the proposed framework are confirmed by some case studies and quantitative evaluations of the detected timings.
提出了一个预测-解释框架来确定小组讨论中对话者印象形成的时间和行为。我们的目标是16种印象的自我报告分数,包括享受和专注。为此,我们将问题表述为发现有助于印象预测的行为特征,并确定行为频繁发生的时间。为了解决这一问题,本文提出了一个由预测部分和解释部分组成的双重框架。前者的预测部分采用随机森林回归,利用头部运动功能特征和基于bert的语言特征,可以捕捉交互式会话行为的各个方面。后一部分使用SHAP分析测量特征对预测的贡献水平,并引入了特征随时间贡献的时间分解的新思想。通过特征贡献时间分布的局部最大值来确定影响行为及其时间。针对17组4名女性的讨论,提出的框架的可预测性和可解释性得到了一些案例研究和对所发现时间的定量评价的证实。
{"title":"Identifying Interlocutors' Behaviors and its Timings Involved with Impression Formation from Head-Movement Features and Linguistic Features","authors":"Shumpei Otsuchi, Koya Ito, Yoko Ishii, Ryo Ishii, Shinichirou Eitoku, Kazuhiro Otsuka","doi":"10.1145/3577190.3614124","DOIUrl":"https://doi.org/10.1145/3577190.3614124","url":null,"abstract":"A prediction-explanation framework is proposed to identify when and what behaviors are involved in forming interlocutors’ impressions in group discussions. We targeted the self-reported scores of 16 impressions, including enjoyment and concentration. To that end, we formulate the problem as discovering behavioral features that contributed to the impression prediction and determining the timings that the behaviors frequently occurred. To solve this problem, this paper proposes a two-fold framework consisting of the prediction part followed by the explanation part. The former prediction part employs random forest regressors using functional head-movement features and BERT-based linguistic features, which can capture various aspects of interactive conversational behaviors. The later part measures the levels of features’ contribution to the prediction using a SHAP analysis and introduces a novel idea of temporal decomposition of features’ contributions over time. The influential behaviors and their timings are identified from local maximums of the temporal distribution of features’ contributions. Targeting 17-group 4-female discussions, the predictability and explainability of the proposed framework are confirmed by some case studies and quantitative evaluations of the detected timings.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Toward Fair Facial Expression Recognition with Improved Distribution Alignment 基于改进分布对齐的公平面部表情识别
Mojtaba Kolahdouzi, Ali Etemad
We present a novel approach to mitigate bias in facial expression recognition (FER) models. Our method aims to reduce sensitive attribute information such as gender, age, or race, in the embeddings produced by FER models. We employ a kernel mean shrinkage estimator to estimate the kernel mean of the distributions of the embeddings associated with different sensitive attribute groups, such as young and old, in the Hilbert space. Using this estimation, we calculate the maximum mean discrepancy (MMD) distance between the distributions and incorporate it in the classifier loss along with an adversarial loss, which is then minimized through the learning process to improve the distribution alignment. Our method makes sensitive attributes less recognizable for the model, which in turn promotes fairness. Additionally, for the first time, we analyze the notion of attractiveness as an important sensitive attribute in FER models and demonstrate that FER models can indeed exhibit biases towards more attractive faces. To prove the efficacy of our model in reducing bias regarding different sensitive attributes (including the newly proposed attractiveness attribute), we perform several experiments on two widely used datasets, CelebA and RAF-DB. The results in terms of both accuracy and fairness measures outperform the state-of-the-art in most cases, demonstrating the effectiveness of the proposed method.
我们提出了一种新的方法来减轻面部表情识别(FER)模型中的偏见。我们的方法旨在减少FER模型产生的嵌入中的敏感属性信息,如性别、年龄或种族。我们使用核均值收缩估计器来估计Hilbert空间中与不同敏感属性组(如年轻和年老)相关的嵌入分布的核均值。使用此估计,我们计算分布之间的最大平均差异(MMD)距离,并将其与对抗损失一起纳入分类器损失中,然后通过学习过程将其最小化以改善分布一致性。我们的方法降低了模型的敏感属性的可识别性,从而提高了公平性。此外,我们首次分析了吸引力作为FER模型中一个重要敏感属性的概念,并证明了FER模型确实会对更有吸引力的面孔表现出偏见。为了证明我们的模型在减少不同敏感属性(包括新提出的吸引力属性)的偏差方面的有效性,我们在两个广泛使用的数据集CelebA和RAF-DB上进行了几个实验。结果在准确性和公平性方面的措施优于国家的最先进的在大多数情况下,证明了所提出的方法的有效性。
{"title":"Toward Fair Facial Expression Recognition with Improved Distribution Alignment","authors":"Mojtaba Kolahdouzi, Ali Etemad","doi":"10.1145/3577190.3614141","DOIUrl":"https://doi.org/10.1145/3577190.3614141","url":null,"abstract":"We present a novel approach to mitigate bias in facial expression recognition (FER) models. Our method aims to reduce sensitive attribute information such as gender, age, or race, in the embeddings produced by FER models. We employ a kernel mean shrinkage estimator to estimate the kernel mean of the distributions of the embeddings associated with different sensitive attribute groups, such as young and old, in the Hilbert space. Using this estimation, we calculate the maximum mean discrepancy (MMD) distance between the distributions and incorporate it in the classifier loss along with an adversarial loss, which is then minimized through the learning process to improve the distribution alignment. Our method makes sensitive attributes less recognizable for the model, which in turn promotes fairness. Additionally, for the first time, we analyze the notion of attractiveness as an important sensitive attribute in FER models and demonstrate that FER models can indeed exhibit biases towards more attractive faces. To prove the efficacy of our model in reducing bias regarding different sensitive attributes (including the newly proposed attractiveness attribute), we perform several experiments on two widely used datasets, CelebA and RAF-DB. The results in terms of both accuracy and fairness measures outperform the state-of-the-art in most cases, demonstrating the effectiveness of the proposed method.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"274 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating Outside the Box: Lessons Learned on eXtended Reality Multi-modal Experiments Beyond the Laboratory 评估盒外:经验教训扩展现实多模态实验超出实验室
Bernardo Marques, Samuel Silva, Rafael Maio, João Alves, Carlos Ferreira, Paulo Dias, Beatriz Sousa Santos
Over time, numerous multimodal eXtended Reality (XR) user studies have been conducted in laboratory environments, with participants fulfilling tasks under the guidance of a researcher. Although generalizable results contributed to increase the maturity of the field, it is also paramount to address the ecological validity of evaluations outside the laboratory. Despite real-world scenarios being clearly challenging, successful in-situ and remote deployment has become realistic to address a broad variety of research questions, thus, expanding participants’ sample to more specific target users, considering multi-modal constraints not reflected in controlled laboratory settings and other benefits. In this paper, a set of multimodal XR experiments conducted outside the laboratory are described (e.g., industrial field studies, remote collaborative tasks, longitudinal rehabilitation exercises). Then, a list of lessons learned is reported, illustrating challenges, and opportunities, aiming to increase the level of awareness of the research community and facilitate performing further evaluations.
随着时间的推移,在实验室环境中进行了许多多模态扩展现实(XR)用户研究,参与者在研究人员的指导下完成任务。虽然可推广的结果有助于提高该领域的成熟度,但在实验室之外解决评估的生态有效性也是至关重要的。尽管现实世界的场景显然具有挑战性,但成功的现场和远程部署已经成为现实,可以解决各种各样的研究问题,从而将参与者的样本扩展到更具体的目标用户,考虑到受控实验室环境中未反映的多模态约束和其他好处。本文描述了一组在实验室之外进行的多模式XR实验(例如,工业现场研究,远程协作任务,纵向康复练习)。然后,报告一份经验教训清单,说明挑战和机遇,旨在提高研究界的认识水平,并促进进行进一步的评估。
{"title":"Evaluating Outside the Box: Lessons Learned on eXtended Reality Multi-modal Experiments Beyond the Laboratory","authors":"Bernardo Marques, Samuel Silva, Rafael Maio, João Alves, Carlos Ferreira, Paulo Dias, Beatriz Sousa Santos","doi":"10.1145/3577190.3614134","DOIUrl":"https://doi.org/10.1145/3577190.3614134","url":null,"abstract":"Over time, numerous multimodal eXtended Reality (XR) user studies have been conducted in laboratory environments, with participants fulfilling tasks under the guidance of a researcher. Although generalizable results contributed to increase the maturity of the field, it is also paramount to address the ecological validity of evaluations outside the laboratory. Despite real-world scenarios being clearly challenging, successful in-situ and remote deployment has become realistic to address a broad variety of research questions, thus, expanding participants’ sample to more specific target users, considering multi-modal constraints not reflected in controlled laboratory settings and other benefits. In this paper, a set of multimodal XR experiments conducted outside the laboratory are described (e.g., industrial field studies, remote collaborative tasks, longitudinal rehabilitation exercises). Then, a list of lessons learned is reported, illustrating challenges, and opportunities, aiming to increase the level of awareness of the research community and facilitate performing further evaluations.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal Turn Analysis and Prediction for Multi-party Conversations 多方对话的多模态转向分析与预测
Meng-Chen Lee, Mai Trinh, Zhigang Deng
This paper presents a computational study to analyze and predict turns (i.e., turn-taking and turn-keeping) in multiparty conversations. Specifically, we use a high-fidelity hybrid data acquisition system to capture a large-scale set of multi-modal natural conversational behaviors of interlocutors in three-party conversations, including gazes, head movements, body movements, speech, etc. Based on the inter-pausal units (IPUs) extracted from the in-house acquired dataset, we propose a transformer-based computational model to predict the turns based on the interlocutor states (speaking/back-channeling/silence) and the gaze targets. Our model can robustly achieve more than 80% accuracy, and the generalizability of our model was extensively validated through cross-group experiments. Also, we introduce a novel computational metric called “relative engagement level" (REL) of IPUs, and further validate its statistical significance between turn-keeping IPUs and turn-taking IPUs, and between different conversational groups. Our experimental results also found that the patterns of the interlocutor states can be used as a more effective cue than their gaze behaviors for predicting turns in multiparty conversations.
本文提出了一种分析和预测多方对话中的回合(即回合选择和回合保持)的计算方法。具体而言,我们使用高保真混合数据采集系统来捕获三方对话中对话者的大规模多模态自然会话行为,包括凝视、头部运动、身体运动、语音等。基于从内部采集的数据集中提取的间歇单位(ipu),我们提出了一个基于变压器的计算模型,该模型基于对话者状态(说话/反向通道/沉默)和凝视目标来预测转弯。该模型稳健性达到80%以上的准确率,并通过跨组实验广泛验证了模型的泛化性。此外,我们引入了一种新的计算度量,即ipu的“相对参与水平”(REL),并进一步验证了其在回合保持ipu和回合采取ipu之间以及不同会话组之间的统计显著性。我们的实验结果还发现,在多方对话中,对话者状态的模式可以作为比凝视行为更有效的线索来预测对话的转向。
{"title":"Multimodal Turn Analysis and Prediction for Multi-party Conversations","authors":"Meng-Chen Lee, Mai Trinh, Zhigang Deng","doi":"10.1145/3577190.3614139","DOIUrl":"https://doi.org/10.1145/3577190.3614139","url":null,"abstract":"This paper presents a computational study to analyze and predict turns (i.e., turn-taking and turn-keeping) in multiparty conversations. Specifically, we use a high-fidelity hybrid data acquisition system to capture a large-scale set of multi-modal natural conversational behaviors of interlocutors in three-party conversations, including gazes, head movements, body movements, speech, etc. Based on the inter-pausal units (IPUs) extracted from the in-house acquired dataset, we propose a transformer-based computational model to predict the turns based on the interlocutor states (speaking/back-channeling/silence) and the gaze targets. Our model can robustly achieve more than 80% accuracy, and the generalizability of our model was extensively validated through cross-group experiments. Also, we introduce a novel computational metric called “relative engagement level\" (REL) of IPUs, and further validate its statistical significance between turn-keeping IPUs and turn-taking IPUs, and between different conversational groups. Our experimental results also found that the patterns of the interlocutor states can be used as a more effective cue than their gaze behaviors for predicting turns in multiparty conversations.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Robot Just for You: Multimodal Personalized Human-Robot Interaction and the Future of Work and Care 一个适合你的机器人:多模式个性化人机交互和未来的工作和护理
Maja Mataric
As AI becomes ubiquitous, its physical embodiment—robots–will also gradually enter our lives. As they do, we will demand that they understand us, predict our needs and wants, and adapt to us as we change our moods and minds, learn, grow, and age. The nexus created by recent major advances in machine learning for machine perception, navigation, and natural language processing has enabled human-robot interaction in real-world contexts, just as the need for human services continues to grow, from elder care to nursing to education and training. This talk will discuss our research in socially assistive robotics (SAR), which uses embodied social interaction to support user goals in health, wellness, training, and education. SAR brings together machine learning for user modeling, multimodal behavioral signal processing, and affective computing to enable robots to understand, interact, and adapt to users’ specific and ever-changing needs. The talk will cover methods and challenges of using multi-modal interaction data and expressive robot behavior to monitor, coach, motivate, and support a wide variety of user populations and use cases. We will cover insights from work with users across the age span (infants, children, adults, elderly), ability span (typically developing, autism, stroke, Alzheimer’s), contexts (schools, therapy centers, homes), and deployment durations (up to 6 months), as well as commercial implications.
随着人工智能变得无处不在,它的物理化身——机器人——也将逐渐进入我们的生活。当它们这样做的时候,我们会要求它们理解我们,预测我们的需求和欲望,并在我们情绪和思想的变化、学习、成长和衰老时适应我们。最近机器学习在机器感知、导航和自然语言处理方面的重大进展所创造的联系,使人类与机器人在现实环境中的互动成为可能,就像对人类服务的需求不断增长一样,从老年人护理到护理,再到教育和培训。本次演讲将讨论我们在社会辅助机器人(SAR)方面的研究,它使用具体化的社会互动来支持用户在健康、保健、培训和教育方面的目标。SAR将用于用户建模的机器学习、多模态行为信号处理和情感计算结合在一起,使机器人能够理解、交互并适应用户的特定和不断变化的需求。演讲将涵盖使用多模态交互数据和表达机器人行为来监控、指导、激励和支持各种用户群体和用例的方法和挑战。我们将涵盖与不同年龄范围(婴儿、儿童、成人、老年人)、能力范围(通常是发育、自闭症、中风、阿尔茨海默氏症)、环境(学校、治疗中心、家庭)、部署持续时间(长达6个月)以及商业影响的用户一起工作的见解。
{"title":"A Robot Just for You: Multimodal Personalized Human-Robot Interaction and the Future of Work and Care","authors":"Maja Mataric","doi":"10.1145/3577190.3616524","DOIUrl":"https://doi.org/10.1145/3577190.3616524","url":null,"abstract":"As AI becomes ubiquitous, its physical embodiment—robots–will also gradually enter our lives. As they do, we will demand that they understand us, predict our needs and wants, and adapt to us as we change our moods and minds, learn, grow, and age. The nexus created by recent major advances in machine learning for machine perception, navigation, and natural language processing has enabled human-robot interaction in real-world contexts, just as the need for human services continues to grow, from elder care to nursing to education and training. This talk will discuss our research in socially assistive robotics (SAR), which uses embodied social interaction to support user goals in health, wellness, training, and education. SAR brings together machine learning for user modeling, multimodal behavioral signal processing, and affective computing to enable robots to understand, interact, and adapt to users’ specific and ever-changing needs. The talk will cover methods and challenges of using multi-modal interaction data and expressive robot behavior to monitor, coach, motivate, and support a wide variety of user populations and use cases. We will cover insights from work with users across the age span (infants, children, adults, elderly), ability span (typically developing, autism, stroke, Alzheimer’s), contexts (schools, therapy centers, homes), and deployment durations (up to 6 months), as well as commercial implications.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Crowd Behaviour Prediction using Visual and Location Data in Super-Crowded Scenarios 超拥挤场景中使用视觉和位置数据的人群行为预测
Antonius Bima Murti Wijaya
Predicting the future trajectory of a crowd is important for safety to prevent disasters such as stampedes or collisions. Extensive research has been conducted to explore trajectory prediction in typical crowd scenarios, where the majority of individuals can be easily identified. However, this study focuses on a more challenging scenario known as the super-crowd scene, wherein individuals within the crowd can only be annotated based on their heads. In this particular scenario, people’s re-identification process in tracking does not perform well due to a lack of clear image data. Our research proposes a clustering strategy to overcome people re-identification problems and predict the cluster crowd trajectory. Two-dimensional(2D) maps and multi-cameras will be used to capture full pictures of crowds in a location and extract the venue’s spatial data (see figure 1). The research methodology encompasses several key steps, including evaluating data extraction of the state-of-the-art methods, estimating crowd clusters, integrating 2D maps and multi-view fusion, and evaluating the proposed method on a dataset of multi-view videos collected in a real-world super-crowded scenario.
预测人群的未来轨迹对于防止踩踏或碰撞等灾难的安全至关重要。在典型的人群场景中,人们已经进行了大量的研究来探索轨迹预测,其中大多数个体很容易被识别。然而,这项研究关注的是一个更具挑战性的场景,即超级人群场景,其中人群中的个体只能根据他们的头部进行注释。在这个特定的场景中,由于缺乏清晰的图像数据,人们在跟踪中的再识别过程表现不佳。我们的研究提出了一种聚类策略来克服人的再识别问题,并预测聚类人群的轨迹。二维(2D)地图和多摄像头将用于捕捉一个地点人群的全图,并提取场地的空间数据(见图1)。研究方法包括几个关键步骤,包括评估最先进方法的数据提取,估计人群集群,整合2D地图和多视图融合,以及在真实世界的超拥挤场景中收集的多视图视频数据集上评估所提出的方法。
{"title":"Crowd Behaviour Prediction using Visual and Location Data in Super-Crowded Scenarios","authors":"Antonius Bima Murti Wijaya","doi":"10.1145/3577190.3614230","DOIUrl":"https://doi.org/10.1145/3577190.3614230","url":null,"abstract":"Predicting the future trajectory of a crowd is important for safety to prevent disasters such as stampedes or collisions. Extensive research has been conducted to explore trajectory prediction in typical crowd scenarios, where the majority of individuals can be easily identified. However, this study focuses on a more challenging scenario known as the super-crowd scene, wherein individuals within the crowd can only be annotated based on their heads. In this particular scenario, people’s re-identification process in tracking does not perform well due to a lack of clear image data. Our research proposes a clustering strategy to overcome people re-identification problems and predict the cluster crowd trajectory. Two-dimensional(2D) maps and multi-cameras will be used to capture full pictures of crowds in a location and extract the venue’s spatial data (see figure 1). The research methodology encompasses several key steps, including evaluating data extraction of the state-of-the-art methods, estimating crowd clusters, integrating 2D maps and multi-view fusion, and evaluating the proposed method on a dataset of multi-view videos collected in a real-world super-crowded scenario.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EEG-based Cognitive Load Classification using Feature Masked Autoencoding and Emotion Transfer Learning 基于特征掩蔽自编码和情绪迁移学习的脑电认知负荷分类
Dustin Pulver, Prithila Angkan, Paul Hungler, Ali Etemad
Cognitive load, the amount of mental effort required for task completion, plays an important role in performance and decision-making outcomes, making its classification and analysis essential in various sensitive domains. In this paper, we present a new solution for the classification of cognitive load using electroencephalogram (EEG). Our model uses a transformer architecture employing transfer learning between emotions and cognitive load. We pre-train our model using self-supervised masked autoencoding on emotion-related EEG datasets and use transfer learning with both frozen weights and fine-tuning to perform downstream cognitive load classification. To evaluate our method, we carry out a series of experiments utilizing two publicly available EEG-based emotion datasets, namely SEED and SEED-IV, for pre-training, while we use the CL-Drive dataset for downstream cognitive load classification. The results of our experiments show that our proposed approach achieves strong results and outperforms conventional single-stage fully supervised learning. Moreover, we perform detailed ablation and sensitivity studies to evaluate the impact of different aspects of our proposed solution. This research contributes to the growing body of literature in affective computing with a focus on cognitive load, and opens up new avenues for future research in the field of cross-domain transfer learning using self-supervised pre-training.
认知负荷,即完成任务所需的脑力劳动量,在绩效和决策结果中起着重要作用,因此在各种敏感领域对其进行分类和分析是必不可少的。本文提出了一种新的基于脑电图的认知负荷分类方法。我们的模型使用了一个在情绪和认知负荷之间迁移学习的转换器架构。我们在情绪相关的脑电图数据集上使用自监督掩码自动编码对模型进行预训练,并使用具有固定权重和微调的迁移学习来执行下游认知负荷分类。为了评估我们的方法,我们利用两个公开的基于脑电图的情绪数据集SEED和SEED- iv进行了一系列实验,用于预训练,同时我们使用CL-Drive数据集进行下游认知负荷分类。我们的实验结果表明,我们提出的方法取得了很强的效果,并且优于传统的单阶段全监督学习。此外,我们还进行了详细的消融和敏感性研究,以评估我们提出的解决方案的不同方面的影响。本研究为关注认知负荷的情感计算领域的研究做出了贡献,并为使用自监督预训练的跨领域迁移学习领域的未来研究开辟了新的途径。
{"title":"EEG-based Cognitive Load Classification using Feature Masked Autoencoding and Emotion Transfer Learning","authors":"Dustin Pulver, Prithila Angkan, Paul Hungler, Ali Etemad","doi":"10.1145/3577190.3614113","DOIUrl":"https://doi.org/10.1145/3577190.3614113","url":null,"abstract":"Cognitive load, the amount of mental effort required for task completion, plays an important role in performance and decision-making outcomes, making its classification and analysis essential in various sensitive domains. In this paper, we present a new solution for the classification of cognitive load using electroencephalogram (EEG). Our model uses a transformer architecture employing transfer learning between emotions and cognitive load. We pre-train our model using self-supervised masked autoencoding on emotion-related EEG datasets and use transfer learning with both frozen weights and fine-tuning to perform downstream cognitive load classification. To evaluate our method, we carry out a series of experiments utilizing two publicly available EEG-based emotion datasets, namely SEED and SEED-IV, for pre-training, while we use the CL-Drive dataset for downstream cognitive load classification. The results of our experiments show that our proposed approach achieves strong results and outperforms conventional single-stage fully supervised learning. Moreover, we perform detailed ablation and sensitivity studies to evaluate the impact of different aspects of our proposed solution. This research contributes to the growing body of literature in affective computing with a focus on cognitive load, and opens up new avenues for future research in the field of cross-domain transfer learning using self-supervised pre-training.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal Bias: Assessing Gender Bias in Computer Vision Models with NLP Techniques 多模态偏差:用NLP技术评估计算机视觉模型中的性别偏差
Abhishek Mandal, Suzanne Little, Susan Leavy
Large multimodal deep learning models such as Contrastive Language Image Pretraining (CLIP) have become increasingly powerful with applications across several domains in recent years. CLIP works on visual and language modalities and forms a part of several popular models, such as DALL-E and Stable Diffusion. It is trained on a large dataset of millions of image-text pairs crawled from the internet. Such large datasets are often used for training purposes without filtering, leading to models inheriting social biases from internet data. Given that models such as CLIP are being applied in such a wide variety of applications ranging from social media to education, it is vital that harmful biases are detected. However, due to the unbounded nature of the possible inputs and outputs, traditional bias metrics such as accuracy cannot detect the range and complexity of biases present in the model. In this paper, we present an audit of CLIP using an established technique from natural language processing called Word Embeddings Association Test (WEAT) to detect and quantify gender bias in CLIP and demonstrate that it can provide a quantifiable measure of such stereotypical associations. We detected, measured, and visualised various types of stereotypical gender associations with respect to character descriptions and occupations and found that CLIP shows evidence of stereotypical gender bias.
近年来,对比语言图像预训练(CLIP)等大型多模态深度学习模型在多个领域的应用越来越强大。CLIP在视觉和语言模式上工作,并形成了几个流行模型的一部分,如DALL-E和稳定扩散。它是在从互联网上抓取的数百万图像-文本对的大型数据集上进行训练的。这种大型数据集通常用于训练目的而不进行过滤,导致模型从互联网数据中继承社会偏见。鉴于像CLIP这样的模型正被应用于从社交媒体到教育等各种各样的应用中,发现有害的偏见至关重要。然而,由于可能输入和输出的无界性质,传统的偏差度量(如精度)无法检测模型中存在的偏差的范围和复杂性。在本文中,我们使用一种来自自然语言处理的成熟技术,称为词嵌入关联测试(WEAT),对CLIP进行审计,以检测和量化CLIP中的性别偏见,并证明它可以提供这种刻板印象关联的可量化测量。我们检测、测量并可视化了与角色描述和职业相关的各种类型的刻板印象性别关联,发现CLIP显示了刻板印象性别偏见的证据。
{"title":"Multimodal Bias: Assessing Gender Bias in Computer Vision Models with NLP Techniques","authors":"Abhishek Mandal, Suzanne Little, Susan Leavy","doi":"10.1145/3577190.3614156","DOIUrl":"https://doi.org/10.1145/3577190.3614156","url":null,"abstract":"Large multimodal deep learning models such as Contrastive Language Image Pretraining (CLIP) have become increasingly powerful with applications across several domains in recent years. CLIP works on visual and language modalities and forms a part of several popular models, such as DALL-E and Stable Diffusion. It is trained on a large dataset of millions of image-text pairs crawled from the internet. Such large datasets are often used for training purposes without filtering, leading to models inheriting social biases from internet data. Given that models such as CLIP are being applied in such a wide variety of applications ranging from social media to education, it is vital that harmful biases are detected. However, due to the unbounded nature of the possible inputs and outputs, traditional bias metrics such as accuracy cannot detect the range and complexity of biases present in the model. In this paper, we present an audit of CLIP using an established technique from natural language processing called Word Embeddings Association Test (WEAT) to detect and quantify gender bias in CLIP and demonstrate that it can provide a quantifiable measure of such stereotypical associations. We detected, measured, and visualised various types of stereotypical gender associations with respect to character descriptions and occupations and found that CLIP shows evidence of stereotypical gender bias.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Augmented Immersive Viewing and Listening Experience Based on Arbitrarily Angled Interactive Audiovisual Representation 基于任意角度交互式视听表现的增强沉浸式视听体验
Toshiharu Horiuchi, Shota Okubo, Tatsuya Kobayashi
We propose an arbitrarily angled interactive audiovisual representation technique that combines a unique sound field synthesis with visual representation in order to augment the possibility of interactive immersive viewing experiences on mobile devices. This technique can synthesize two-channel stereo sound with constant stereo width having an arbitrary angle range from minimum 30 to maximum 360 degrees centering on an arbitrary direction from multi-channel surround sound. The visual representation can be chosen either equirectangular projection or stereographic projection. The developed video player app allows users to enjoy arbitrarily angled 360-degree videos by manipulating the touchscreen, and the stereo sound and the visual representation changes in terms of its spatial synchronization depending on the view. The app was released as a demonstration, and its acceptability and worth were investigated through interviews and subjective assessment tests. The app has been well received, and to date, more than 30 pieces of content have been produced in multiple genres, with a total of more than 200,000 views.
我们提出了一种任意角度的交互式视听表现技术,将独特的声场合成与视觉表现相结合,以增加在移动设备上交互式沉浸式观看体验的可能性。该技术可以合成具有恒定立体声宽度的双声道立体声,具有以任意方向为中心的最小30度到最大360度的任意角度范围。视觉表示可以选择等矩形投影或立体投影。开发的视频播放器应用程序可以让用户通过操作触摸屏来欣赏任意角度的360度视频,并且立体声和视觉表现根据视图的空间同步变化。该应用作为示范发布,并通过访谈和主观评估测试来调查其可接受性和价值。这款应用广受好评,到目前为止,已经制作了30多篇不同类型的内容,总浏览量超过20万次。
{"title":"Augmented Immersive Viewing and Listening Experience Based on Arbitrarily Angled Interactive Audiovisual Representation","authors":"Toshiharu Horiuchi, Shota Okubo, Tatsuya Kobayashi","doi":"10.1145/3577190.3614138","DOIUrl":"https://doi.org/10.1145/3577190.3614138","url":null,"abstract":"We propose an arbitrarily angled interactive audiovisual representation technique that combines a unique sound field synthesis with visual representation in order to augment the possibility of interactive immersive viewing experiences on mobile devices. This technique can synthesize two-channel stereo sound with constant stereo width having an arbitrary angle range from minimum 30 to maximum 360 degrees centering on an arbitrary direction from multi-channel surround sound. The visual representation can be chosen either equirectangular projection or stereographic projection. The developed video player app allows users to enjoy arbitrarily angled 360-degree videos by manipulating the touchscreen, and the stereo sound and the visual representation changes in terms of its spatial synchronization depending on the view. The app was released as a demonstration, and its acceptability and worth were investigated through interviews and subjective assessment tests. The app has been well received, and to date, more than 30 pieces of content have been produced in multiple genres, with a total of more than 200,000 views.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AQ-GT: a Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis 一种时间对齐和量化的协同语音手势合成GRU-Transformer
Hendric Voß, Stefan Kopp
The generation of realistic and contextually relevant co-speech gestures is a challenging yet increasingly important task in the creation of multimodal artificial agents. Prior methods focused on learning a direct correspondence between co-speech gesture representations and produced motions, which created seemingly natural but often unconvincing gestures during human assessment. We present an approach to pre-train partial gesture sequences using a generative adversarial network with a quantization pipeline. The resulting codebook vectors serve as both input and output in our framework, forming the basis for the generation and reconstruction of gestures. By learning the mapping of a latent space representation as opposed to directly mapping it to a vector representation, this framework facilitates the generation of highly realistic and expressive gestures that closely replicate human movement and behavior, while simultaneously avoiding artifacts in the generation process. We evaluate our approach by comparing it with established methods for generating co-speech gestures as well as with existing datasets of human behavior. We also perform an ablation study to assess our findings. The results show that our approach outperforms the current state of the art by a clear margin and is partially indistinguishable from human gesturing. We make our data pipeline and the generation framework publicly available.
在多模态人工智能体的创建中,生成逼真且与上下文相关的同语音手势是一项具有挑战性但日益重要的任务。先前的方法侧重于学习共同语音手势表征和产生的动作之间的直接对应关系,这些动作在人类评估过程中产生了看似自然但往往不令人信服的手势。我们提出了一种使用带有量化管道的生成对抗网络对部分手势序列进行预训练的方法。生成的码本向量在我们的框架中作为输入和输出,形成手势生成和重建的基础。通过学习潜在空间表示的映射,而不是直接将其映射到向量表示,该框架有助于生成高度逼真和富有表现力的手势,这些手势紧密地复制了人类的运动和行为,同时避免了生成过程中的伪影。我们通过将我们的方法与生成共同语音手势的既定方法以及现有的人类行为数据集进行比较来评估我们的方法。我们还进行了消融研究来评估我们的发现。结果表明,我们的方法明显优于当前的技术水平,并且在一定程度上与人类手势无法区分。我们让数据管道和生成框架公开可用。
{"title":"AQ-GT: a Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis","authors":"Hendric Voß, Stefan Kopp","doi":"10.1145/3577190.3614135","DOIUrl":"https://doi.org/10.1145/3577190.3614135","url":null,"abstract":"The generation of realistic and contextually relevant co-speech gestures is a challenging yet increasingly important task in the creation of multimodal artificial agents. Prior methods focused on learning a direct correspondence between co-speech gesture representations and produced motions, which created seemingly natural but often unconvincing gestures during human assessment. We present an approach to pre-train partial gesture sequences using a generative adversarial network with a quantization pipeline. The resulting codebook vectors serve as both input and output in our framework, forming the basis for the generation and reconstruction of gestures. By learning the mapping of a latent space representation as opposed to directly mapping it to a vector representation, this framework facilitates the generation of highly realistic and expressive gestures that closely replicate human movement and behavior, while simultaneously avoiding artifacts in the generation process. We evaluate our approach by comparing it with established methods for generating co-speech gestures as well as with existing datasets of human behavior. We also perform an ablation study to assess our findings. The results show that our approach outperforms the current state of the art by a clear margin and is partially indistinguishable from human gesturing. We make our data pipeline and the generation framework publicly available.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Companion Publication of the 2020 International Conference on Multimodal Interaction
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1