首页 > 最新文献

Proceedings of the 2020 International Conference on Multimodal Interaction最新文献

英文 中文
Extract the Gaze Multi-dimensional Information Analysis Driver Behavior 提取注视多维信息分析驾驶员行为
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3417972
Kui Lyu, Minghao Wang, Liyu Meng
Recent studies has been shown that most traffic accidents are related to the driver's engagement in the driving process. Driver gaze is considered as an important cue to monitor driver distraction. While there has been marked improvement in driver gaze region estimation systems, but there are many challenges exist like cross subject test, perspectives and sensor configuration. In this paper, we propose a Convolutional Neural Networks (CNNs) based multi-model fusion gaze zone estimation systems. Our method mainly consists of two blocks, which implemented the extraction of gaze features based on RGB images and estimation of gaze based on head pose features. Based on the original input image, first general face processing model were used to detect face and localize 3D landmarks, and then extract the most relevant facial information based on it. We implement three face alignment methods to normalize the face information. For the above image-based features, using a multi-input CNN classifier can get reliable classification accuracy. In addition, we design a 2D CNN based PointNet predict the head pose representation by 3D landmarks. Finally, we evaluate our best performance model on the Eighth EmotiW Driver Gaze Prediction sub-challenge test dataset. Our model has a competitive overall accuracy of 81.5144% gaze zone estimation ability on the cross-subject test dataset.
最近的研究表明,大多数交通事故都与驾驶员在驾驶过程中的投入有关。驾驶员凝视被认为是监测驾驶员注意力分散的重要线索。虽然驾驶员注视区域估计系统已经有了明显的进步,但仍然存在许多挑战,如跨主体测试、视角和传感器配置。本文提出了一种基于卷积神经网络(cnn)的多模型融合注视区域估计系统。我们的方法主要由两个块组成,分别实现了基于RGB图像的凝视特征提取和基于头部姿势特征的凝视估计。在原始输入图像的基础上,首先利用通用人脸处理模型进行人脸检测和三维地标定位,然后在此基础上提取最相关的人脸信息。我们实现了三种人脸对齐方法来对人脸信息进行归一化。对于上述基于图像的特征,使用多输入CNN分类器可以获得可靠的分类精度。此外,我们设计了一个基于二维CNN的PointNet,通过三维地标来预测头部姿态的表示。最后,我们在第八EmotiW驾驶员凝视预测子挑战测试数据集上评估了我们的最佳性能模型。我们的模型在跨主题测试数据集上具有81.5144%的注视区域估计能力,整体精度具有竞争力。
{"title":"Extract the Gaze Multi-dimensional Information Analysis Driver Behavior","authors":"Kui Lyu, Minghao Wang, Liyu Meng","doi":"10.1145/3382507.3417972","DOIUrl":"https://doi.org/10.1145/3382507.3417972","url":null,"abstract":"Recent studies has been shown that most traffic accidents are related to the driver's engagement in the driving process. Driver gaze is considered as an important cue to monitor driver distraction. While there has been marked improvement in driver gaze region estimation systems, but there are many challenges exist like cross subject test, perspectives and sensor configuration. In this paper, we propose a Convolutional Neural Networks (CNNs) based multi-model fusion gaze zone estimation systems. Our method mainly consists of two blocks, which implemented the extraction of gaze features based on RGB images and estimation of gaze based on head pose features. Based on the original input image, first general face processing model were used to detect face and localize 3D landmarks, and then extract the most relevant facial information based on it. We implement three face alignment methods to normalize the face information. For the above image-based features, using a multi-input CNN classifier can get reliable classification accuracy. In addition, we design a 2D CNN based PointNet predict the head pose representation by 3D landmarks. Finally, we evaluate our best performance model on the Eighth EmotiW Driver Gaze Prediction sub-challenge test dataset. Our model has a competitive overall accuracy of 81.5144% gaze zone estimation ability on the cross-subject test dataset.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125050975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
You Have a Point There: Object Selection Inside an Automobile Using Gaze, Head Pose and Finger Pointing 你有一个点:对象选择在汽车内使用凝视,头部姿势和手指指向
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418836
Abdul Rafey Aftab, M. V. D. Beeck, M. Feld
Sophisticated user interaction in the automotive industry is a fast emerging topic. Mid-air gestures and speech already have numerous applications for driver-car interaction. Additionally, multimodal approaches are being developed to leverage the use of multiple sensors for added advantages. In this paper, we propose a fast and practical multimodal fusion method based on machine learning for the selection of various control modules in an automotive vehicle. The modalities taken into account are gaze, head pose and finger pointing gesture. Speech is used only as a trigger for fusion. Single modality has previously been used numerous times for recognition of the user's pointing direction. We, however, demonstrate how multiple inputs can be fused together to enhance the recognition performance. Furthermore, we compare different deep neural network architectures against conventional Machine Learning methods, namely Support Vector Regression and Random Forests, and show the enhancements in the pointing direction accuracy using deep learning. The results suggest a great potential for the use of multimodal inputs that can be applied to more use cases in the vehicle.
复杂的用户交互在汽车行业是一个快速兴起的话题。空中手势和语音在驾驶员与汽车的互动中已经有了很多应用。此外,正在开发多模式方法,以利用多个传感器的使用来获得额外的优势。在本文中,我们提出了一种基于机器学习的快速实用的多模态融合方法,用于汽车各种控制模块的选择。考虑的方式是凝视,头部姿势和手指手势。语言只是作为融合的触发因素。单模态以前已经多次用于识别用户指向的方向。然而,我们演示了如何将多个输入融合在一起以提高识别性能。此外,我们将不同的深度神经网络架构与传统的机器学习方法(即支持向量回归和随机森林)进行了比较,并展示了使用深度学习在指向精度方面的增强。结果表明,多模式输入的使用潜力巨大,可以应用于车辆中的更多用例。
{"title":"You Have a Point There: Object Selection Inside an Automobile Using Gaze, Head Pose and Finger Pointing","authors":"Abdul Rafey Aftab, M. V. D. Beeck, M. Feld","doi":"10.1145/3382507.3418836","DOIUrl":"https://doi.org/10.1145/3382507.3418836","url":null,"abstract":"Sophisticated user interaction in the automotive industry is a fast emerging topic. Mid-air gestures and speech already have numerous applications for driver-car interaction. Additionally, multimodal approaches are being developed to leverage the use of multiple sensors for added advantages. In this paper, we propose a fast and practical multimodal fusion method based on machine learning for the selection of various control modules in an automotive vehicle. The modalities taken into account are gaze, head pose and finger pointing gesture. Speech is used only as a trigger for fusion. Single modality has previously been used numerous times for recognition of the user's pointing direction. We, however, demonstrate how multiple inputs can be fused together to enhance the recognition performance. Furthermore, we compare different deep neural network architectures against conventional Machine Learning methods, namely Support Vector Regression and Random Forests, and show the enhancements in the pointing direction accuracy using deep learning. The results suggest a great potential for the use of multimodal inputs that can be applied to more use cases in the vehicle.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121217645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Towards Multimodal Human-Like Characteristics and Expressive Visual Prosody in Virtual Agents 虚拟代理的多模态类人特征与视觉韵律表达
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3421155
Mireille Fares
One of the key challenges in designing Embodied Conversational Agents (ECA) is to produce human-like gestural and visual prosody expressivity. Another major challenge is to maintain the interlocutor's attention by adapting the agent's behavior to the interlocutor's multimodal behavior. This paper outlines my PhD research plan that aims to develop convincing expressive and natural behavior in ECAs and to explore and model the mechanisms that govern human-agent multimodal interaction. Additionally, I describe in this paper my first PhD milestone which focuses on developing an end-to-end LSTM Neural Network model for upper-face gestures generation. The main task consists of building a model that can produce expressive and coherent upper-face gestures while considering multiple modalities: speech audio, text, and action units.
具身会话代理(ECA)设计的关键挑战之一是产生类似人类的手势和视觉韵律表达能力。另一个主要挑战是通过调整代理的行为以适应对话者的多模态行为来保持对话者的注意力。本文概述了我的博士研究计划,旨在开发eca中令人信服的表达和自然行为,并探索和建模控制人类-代理多模态交互的机制。此外,我在论文中描述了我的第一个博士里程碑,重点是开发用于上脸手势生成的端到端LSTM神经网络模型。主要任务包括构建一个模型,该模型可以在考虑多种模式(语音、音频、文本和动作单元)的情况下产生富有表现力和连贯的上脸手势。
{"title":"Towards Multimodal Human-Like Characteristics and Expressive Visual Prosody in Virtual Agents","authors":"Mireille Fares","doi":"10.1145/3382507.3421155","DOIUrl":"https://doi.org/10.1145/3382507.3421155","url":null,"abstract":"One of the key challenges in designing Embodied Conversational Agents (ECA) is to produce human-like gestural and visual prosody expressivity. Another major challenge is to maintain the interlocutor's attention by adapting the agent's behavior to the interlocutor's multimodal behavior. This paper outlines my PhD research plan that aims to develop convincing expressive and natural behavior in ECAs and to explore and model the mechanisms that govern human-agent multimodal interaction. Additionally, I describe in this paper my first PhD milestone which focuses on developing an end-to-end LSTM Neural Network model for upper-face gestures generation. The main task consists of building a model that can produce expressive and coherent upper-face gestures while considering multiple modalities: speech audio, text, and action units.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124122641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Automating Facilitation and Documentation of Collaborative Ideation Processes 协作构思过程的自动化促进和文档化
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3421158
Matthias Merk
My research is is in the field of computer supported and enabled innovation processes, in particular focusing on the first phases of ideation in a co-located environment. I'm developing a concept for documenting, tracking and enhancing creative ideation processes. Base of this concept are key figures derived from various system within the ideation sessions. The system designed in my doctoral thesis enables interdisciplinary teams to kick-start creativity by automating facilitation, moderation, creativity support and documentation of the process. Using the example of brainstorming, a standing table is equipped with camera and microphone based sensing as well as multiple ways of interaction and visualization through projection and LED lights. The user interaction with the table is implicit and based on real time metadata generated by the users of the system. System actions are calculated based on what is happening on the table using object recognition. Everything on the table influences the system thus making it into a multimodal input and output device with implicit interaction. While the technical aspects of my research are close to be done, the more problematic part of evaluation will benefit from feedback from the specialists for multimodal interaction at ICMI20.
我的研究领域是计算机支持和支持的创新过程,特别是集中在一个共同定位的环境中的创意的第一阶段。我正在开发一个概念,用于记录、跟踪和增强创意过程。这个概念的基础是在构思会议中从各个系统中得出的关键数字。我在博士论文中设计的系统使跨学科团队能够通过自动化促进、调节、创造力支持和过程文档来启动创造力。以头脑风暴为例,一张站立的桌子配备了基于摄像头和麦克风的传感,以及通过投影和LED灯进行多种互动和可视化的方式。用户与表的交互是隐式的,并且基于系统用户生成的实时元数据。系统动作是基于使用对象识别的表上发生的事情来计算的。桌子上的所有东西都会影响系统,从而使其成为具有隐式交互的多模态输入和输出设备。虽然我的研究的技术方面即将完成,但评估中更有问题的部分将从ICMI20的多模式交互专家的反馈中受益。
{"title":"Automating Facilitation and Documentation of Collaborative Ideation Processes","authors":"Matthias Merk","doi":"10.1145/3382507.3421158","DOIUrl":"https://doi.org/10.1145/3382507.3421158","url":null,"abstract":"My research is is in the field of computer supported and enabled innovation processes, in particular focusing on the first phases of ideation in a co-located environment. I'm developing a concept for documenting, tracking and enhancing creative ideation processes. Base of this concept are key figures derived from various system within the ideation sessions. The system designed in my doctoral thesis enables interdisciplinary teams to kick-start creativity by automating facilitation, moderation, creativity support and documentation of the process. Using the example of brainstorming, a standing table is equipped with camera and microphone based sensing as well as multiple ways of interaction and visualization through projection and LED lights. The user interaction with the table is implicit and based on real time metadata generated by the users of the system. System actions are calculated based on what is happening on the table using object recognition. Everything on the table influences the system thus making it into a multimodal input and output device with implicit interaction. While the technical aspects of my research are close to be done, the more problematic part of evaluation will benefit from feedback from the specialists for multimodal interaction at ICMI20.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126456722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bridging Social Sciences and AI for Understanding Child Behaviour 连接社会科学和人工智能来理解儿童行为
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3419745
Heysem Kaya, R. Hessels, M. Najafian, S. Hanekamp, Saeid Safavi
Child behaviour is a topic of wide scientific interest among many different disciplines, including social and behavioural sciences and artificial intelligence (AI). In this workshop, we aimed to connect researchers from these fields to address topics such as the usage of AI to better understand and model child behavioural and developmental processes, challenges and opportunities for AI in large-scale child behaviour analysis and implementing explainable ML/AI on sensitive child data. The workshop served as a successful first step towards this goal and attracted contributions from different research disciplines on the analysis of child behaviour. This paper provides a summary of the activities of the workshop and the accepted papers and abstracts.
儿童行为是许多不同学科广泛关注的科学话题,包括社会和行为科学以及人工智能(AI)。在本次研讨会中,我们的目标是将这些领域的研究人员联系起来,讨论人工智能的使用,以更好地理解和模拟儿童行为和发展过程,人工智能在大规模儿童行为分析中的挑战和机遇,以及在敏感儿童数据上实施可解释的ML/AI。讲习班是实现这一目标的成功的第一步,吸引了来自不同研究学科对儿童行为分析的贡献。本文简要介绍了本次研讨会的活动和已接受的论文和摘要。
{"title":"Bridging Social Sciences and AI for Understanding Child Behaviour","authors":"Heysem Kaya, R. Hessels, M. Najafian, S. Hanekamp, Saeid Safavi","doi":"10.1145/3382507.3419745","DOIUrl":"https://doi.org/10.1145/3382507.3419745","url":null,"abstract":"Child behaviour is a topic of wide scientific interest among many different disciplines, including social and behavioural sciences and artificial intelligence (AI). In this workshop, we aimed to connect researchers from these fields to address topics such as the usage of AI to better understand and model child behavioural and developmental processes, challenges and opportunities for AI in large-scale child behaviour analysis and implementing explainable ML/AI on sensitive child data. The workshop served as a successful first step towards this goal and attracted contributions from different research disciplines on the analysis of child behaviour. This paper provides a summary of the activities of the workshop and the accepted papers and abstracts.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126465489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Multi-rate Attention Based GRU Model for Engagement Prediction 基于多速率注意力的用户粘性预测GRU模型
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3417965
Bin Zhu, Xinjie Lan, Xin Guo, K. Barner, C. Boncelet
Engagement detection is essential in many areas such as driver attention tracking, employee engagement monitoring, and student engagement evaluation. In this paper, we propose a novel approach using attention based hybrid deep models for the 8th Emotion Recognition in the Wild (EmotiW 2020) Grand Challenge in the category of engagement prediction in the wild EMOTIW2020. The task aims to predict the engagement intensity of subjects in videos, and the subjects are students watching educational videos from Massive Open Online Courses (MOOCs). To complete the task, we propose a hybrid deep model based on multi-rate and multi-instance attention. The novelty of the proposed model can be summarized in three aspects: (a) an attention based Gated Recurrent Unit (GRU) deep network, (b) heuristic multi-rate processing on video based data, and (c) a rigorous and accurate ensemble model. Experimental results on the validation set and test set show that our method makes promising improvements, achieving a competitively low MSE of 0.0541 on the test set, improving on the baseline results by 64%. The proposed model won the first place in the engagement prediction in the wild challenge.
敬业度检测在许多领域都是必不可少的,比如司机注意力跟踪、员工敬业度监测和学生敬业度评估。在本文中,我们提出了一种使用基于注意力的混合深度模型的新方法,用于第八届野生情绪识别(EMOTIW2020)大挑战赛(EMOTIW2020)的野生情绪识别(EMOTIW2020)投入预测类别。该任务旨在预测视频中科目的参与强度,受试者是观看大规模在线开放课程(Massive Open Online Courses, MOOCs)教育视频的学生。为了完成这一任务,我们提出了一种基于多速率和多实例关注的混合深度模型。该模型的新颖性可以概括为三个方面:(a)基于注意力的门控循环单元(GRU)深度网络,(b)基于视频数据的启发式多速率处理,以及(c)严格而准确的集成模型。在验证集和测试集上的实验结果表明,我们的方法取得了有希望的改进,在测试集上实现了0.0541的竞争性低MSE,比基线结果提高了64%。该模型在野外挑战赛的参与度预测中获得第一名。
{"title":"Multi-rate Attention Based GRU Model for Engagement Prediction","authors":"Bin Zhu, Xinjie Lan, Xin Guo, K. Barner, C. Boncelet","doi":"10.1145/3382507.3417965","DOIUrl":"https://doi.org/10.1145/3382507.3417965","url":null,"abstract":"Engagement detection is essential in many areas such as driver attention tracking, employee engagement monitoring, and student engagement evaluation. In this paper, we propose a novel approach using attention based hybrid deep models for the 8th Emotion Recognition in the Wild (EmotiW 2020) Grand Challenge in the category of engagement prediction in the wild EMOTIW2020. The task aims to predict the engagement intensity of subjects in videos, and the subjects are students watching educational videos from Massive Open Online Courses (MOOCs). To complete the task, we propose a hybrid deep model based on multi-rate and multi-instance attention. The novelty of the proposed model can be summarized in three aspects: (a) an attention based Gated Recurrent Unit (GRU) deep network, (b) heuristic multi-rate processing on video based data, and (c) a rigorous and accurate ensemble model. Experimental results on the validation set and test set show that our method makes promising improvements, achieving a competitively low MSE of 0.0541 on the test set, improving on the baseline results by 64%. The proposed model won the first place in the engagement prediction in the wild challenge.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"130 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131770964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
ROSMI: A Multimodal Corpus for Map-based Instruction-Giving 基于地图的多模态语料库教学
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418861
Miltiadis Marios Katsakioris, Ioannis Konstas, P. Mignotte, Helen F. Hastie
We present the publicly-available Robot Open Street Map Instructions (ROSMI) corpus: a rich multimodal dataset of map and natural language instruction pairs that was collected via crowdsourcing. The goal of this corpus is to aid in the advancement of state-of-the-art visual-dialogue tasks, including reference resolution and robot-instruction understanding. The domain described here concerns robots and autonomous systems being used for inspection and emergency response. The ROSMI corpus is unique in that it captures interaction grounded in map-based visual stimuli that is both human-readable but also contains rich metadata that is needed to plan and deploy robots and autonomous systems, thus facilitating human-robot teaming.
我们展示了公开可用的机器人开放街道地图指令(ROSMI)语料库:通过众包收集的地图和自然语言指令对的丰富多模态数据集。该语料库的目标是帮助推进最先进的视觉对话任务,包括参考分辨率和机器人指令理解。这里描述的领域涉及用于检查和应急响应的机器人和自主系统。ROSMI语料库的独特之处在于,它捕获了基于地图的视觉刺激的交互,这些交互既可由人类阅读,又包含了规划和部署机器人和自主系统所需的丰富元数据,从而促进了人机合作。
{"title":"ROSMI: A Multimodal Corpus for Map-based Instruction-Giving","authors":"Miltiadis Marios Katsakioris, Ioannis Konstas, P. Mignotte, Helen F. Hastie","doi":"10.1145/3382507.3418861","DOIUrl":"https://doi.org/10.1145/3382507.3418861","url":null,"abstract":"We present the publicly-available Robot Open Street Map Instructions (ROSMI) corpus: a rich multimodal dataset of map and natural language instruction pairs that was collected via crowdsourcing. The goal of this corpus is to aid in the advancement of state-of-the-art visual-dialogue tasks, including reference resolution and robot-instruction understanding. The domain described here concerns robots and autonomous systems being used for inspection and emergency response. The ROSMI corpus is unique in that it captures interaction grounded in map-based visual stimuli that is both human-readable but also contains rich metadata that is needed to plan and deploy robots and autonomous systems, thus facilitating human-robot teaming.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"2015 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125641220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
MORSE: MultimOdal sentiment analysis for Real-life SEttings 莫尔斯:多模态情感分析的现实生活设置
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418821
Yiqun Yao, Verónica Pérez-Rosas, M. Abouelenien, Mihai Burzo
Multimodal sentiment analysis aims to detect and classify sentiment expressed in multimodal data. Research to date has focused on datasets with a large number of training samples, manual transcriptions, and nearly-balanced sentiment labels. However, data collection in real settings often leads to small datasets with noisy transcriptions and imbalanced label distributions, which are therefore significantly more challenging than in controlled settings. In this work, we introduce MORSE, a domain-specific dataset for MultimOdal sentiment analysis in Real-life SEttings. The dataset consists of 2,787 video clips extracted from 49 interviews with panelists in a product usage study, with each clip annotated for positive, negative, or neutral sentiment. The characteristics of MORSE include noisy transcriptions from raw videos, naturally imbalanced label distribution, and scarcity of minority labels. To address the challenging real-life settings in MORSE, we propose a novel two-step fine-tuning method for multimodal sentiment classification using transfer learning and the Transformer model architecture; our method starts with a pre-trained language model and one step of fine-tuning on the language modality, followed by the second step of joint fine-tuning that incorporates the visual and audio modalities. Experimental results show that while MORSE is challenging for various baseline models such as SVM and Transformer, our two-step fine-tuning method is able to capture the dataset characteristics and effectively address the challenges. Our method outperforms related work that uses both single and multiple modalities in the same transfer learning settings.
多模态情感分析的目的是对多模态数据中表达的情感进行检测和分类。迄今为止的研究主要集中在具有大量训练样本、手动转录和近乎平衡的情感标签的数据集上。然而,在真实环境中的数据收集通常会导致具有嘈杂转录和不平衡标签分布的小数据集,因此比在受控环境中更具挑战性。在这项工作中,我们引入了MORSE,一个用于现实生活中多模态情感分析的领域特定数据集。该数据集由2787个视频片段组成,这些视频片段是从产品使用研究中与49位小组成员的访谈中提取出来的,每个片段都标注了积极、消极或中性的情绪。MORSE的特点包括来自原始视频的嘈杂转录,自然不平衡的标签分布以及少数标签的稀缺性。为了解决MORSE中具有挑战性的现实环境,我们提出了一种新的两步微调方法,该方法使用迁移学习和Transformer模型架构进行多模态情感分类;我们的方法从一个预训练的语言模型开始,对语言模态进行一步微调,然后是第二步联合微调,将视觉和音频模态结合起来。实验结果表明,虽然MORSE对各种基线模型(如SVM和Transformer)具有挑战性,但我们的两步微调方法能够捕获数据集特征并有效解决挑战。我们的方法优于在相同迁移学习设置中使用单一和多种模式的相关工作。
{"title":"MORSE: MultimOdal sentiment analysis for Real-life SEttings","authors":"Yiqun Yao, Verónica Pérez-Rosas, M. Abouelenien, Mihai Burzo","doi":"10.1145/3382507.3418821","DOIUrl":"https://doi.org/10.1145/3382507.3418821","url":null,"abstract":"Multimodal sentiment analysis aims to detect and classify sentiment expressed in multimodal data. Research to date has focused on datasets with a large number of training samples, manual transcriptions, and nearly-balanced sentiment labels. However, data collection in real settings often leads to small datasets with noisy transcriptions and imbalanced label distributions, which are therefore significantly more challenging than in controlled settings. In this work, we introduce MORSE, a domain-specific dataset for MultimOdal sentiment analysis in Real-life SEttings. The dataset consists of 2,787 video clips extracted from 49 interviews with panelists in a product usage study, with each clip annotated for positive, negative, or neutral sentiment. The characteristics of MORSE include noisy transcriptions from raw videos, naturally imbalanced label distribution, and scarcity of minority labels. To address the challenging real-life settings in MORSE, we propose a novel two-step fine-tuning method for multimodal sentiment classification using transfer learning and the Transformer model architecture; our method starts with a pre-trained language model and one step of fine-tuning on the language modality, followed by the second step of joint fine-tuning that incorporates the visual and audio modalities. Experimental results show that while MORSE is challenging for various baseline models such as SVM and Transformer, our two-step fine-tuning method is able to capture the dataset characteristics and effectively address the challenges. Our method outperforms related work that uses both single and multiple modalities in the same transfer learning settings.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"227 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114988310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Speech, Voice, Text, and Meaning: A Multidisciplinary Approach to Interview Data through the use of digital tools 语音,声音,文本和意义:通过使用数字工具来访问数据的多学科方法
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3420054
A. V. Hessen, S. Calamai, H. V. D. Heuvel, S. Scagliola, N. Karrouche, J. Beeken, Louise Corti, C. Draxler
Interview data is multimodal data: it consists of speech sound, facial expression and gestures, captured in a particular situation, and containing textual information and emotion. This workshop shows how a multidisciplinary approach may exploit the full potential of interview data. The workshop first gives a systematic overview of the research fields working with interview data. It then presents the speech technology currently available to support transcribing and annotating interview data, such as automatic speech recognition, speaker diarization, and emotion detection. Finally, scholars who work with interview data and tools may present their work and discover how to make use of existing technology.
访谈数据是多模态数据:它由语音、面部表情和手势组成,在特定情况下捕获,包含文本信息和情感。本次研讨会展示了多学科方法如何利用访谈数据的全部潜力。研讨会首先对使用访谈数据的研究领域进行了系统的概述。然后介绍了目前可用的语音技术,以支持转录和注释采访数据,如自动语音识别,说话人拨号和情绪检测。最后,使用访谈数据和工具的学者可能会展示他们的工作并发现如何利用现有技术。
{"title":"Speech, Voice, Text, and Meaning: A Multidisciplinary Approach to Interview Data through the use of digital tools","authors":"A. V. Hessen, S. Calamai, H. V. D. Heuvel, S. Scagliola, N. Karrouche, J. Beeken, Louise Corti, C. Draxler","doi":"10.1145/3382507.3420054","DOIUrl":"https://doi.org/10.1145/3382507.3420054","url":null,"abstract":"Interview data is multimodal data: it consists of speech sound, facial expression and gestures, captured in a particular situation, and containing textual information and emotion. This workshop shows how a multidisciplinary approach may exploit the full potential of interview data. The workshop first gives a systematic overview of the research fields working with interview data. It then presents the speech technology currently available to support transcribing and annotating interview data, such as automatic speech recognition, speaker diarization, and emotion detection. Finally, scholars who work with interview data and tools may present their work and discover how to make use of existing technology.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121312260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Robot Assisted Diagnosis of Autism in Children 机器人辅助儿童自闭症诊断
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3421162
B. Ashwini
The diagnosis of autism spectrum disorder is cumbersome even for expert clinicians owing to the diversity in the symptoms exhibited by the children which depend on the severity of the disorder. Furthermore, the diagnosis is based on the behavioural observations and the developmental history of the child which has substantial dependence on the perspectives and interpretations of the specialists. In this paper, we present a robot-assisted diagnostic system for the assessment of behavioural symptoms in children for providing a reliable diagnosis. The robotic assistant is intended to support the specialist in administering the diagnostic task, perceiving and evaluating the task outcomes as well as the behavioural cues for assessment of symptoms and diagnosing the state of the child. Despite being used widely in education and intervention for children with autism (CWA), the application of robot assistance in diagnosis is less explored. Further, there have been limited studies addressing the acceptance and effectiveness of robot-assisted interventions for CWA in the Global South. We aim to develop a robot-assisted diagnostic framework for CWA to support the experts and study the viability of such a system in the Indian context.
自闭症谱系障碍的诊断即使对临床专家来说也是麻烦的,因为儿童表现出的症状的多样性取决于障碍的严重程度。此外,诊断是基于行为观察和儿童的发展史,这在很大程度上依赖于专家的观点和解释。在本文中,我们提出了一个机器人辅助诊断系统,用于评估儿童的行为症状,以提供可靠的诊断。机器人助手的目的是支持专家管理诊断任务,感知和评估任务结果,以及评估症状和诊断儿童状态的行为线索。尽管在自闭症儿童的教育和干预中被广泛使用,但机器人辅助诊断的应用却很少被探索。此外,在全球南方,关于机器人辅助干预的接受度和有效性的研究有限。我们的目标是为CWA开发一个机器人辅助诊断框架,以支持专家并研究这种系统在印度背景下的可行性。
{"title":"Robot Assisted Diagnosis of Autism in Children","authors":"B. Ashwini","doi":"10.1145/3382507.3421162","DOIUrl":"https://doi.org/10.1145/3382507.3421162","url":null,"abstract":"The diagnosis of autism spectrum disorder is cumbersome even for expert clinicians owing to the diversity in the symptoms exhibited by the children which depend on the severity of the disorder. Furthermore, the diagnosis is based on the behavioural observations and the developmental history of the child which has substantial dependence on the perspectives and interpretations of the specialists. In this paper, we present a robot-assisted diagnostic system for the assessment of behavioural symptoms in children for providing a reliable diagnosis. The robotic assistant is intended to support the specialist in administering the diagnostic task, perceiving and evaluating the task outcomes as well as the behavioural cues for assessment of symptoms and diagnosing the state of the child. Despite being used widely in education and intervention for children with autism (CWA), the application of robot assistance in diagnosis is less explored. Further, there have been limited studies addressing the acceptance and effectiveness of robot-assisted interventions for CWA in the Global South. We aim to develop a robot-assisted diagnostic framework for CWA to support the experts and study the viability of such a system in the Indian context.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122353298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
Proceedings of the 2020 International Conference on Multimodal Interaction
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1