首页 > 最新文献

Proceedings of the 2020 International Conference on Multimodal Interaction最新文献

英文 中文
Finally on Par?! Multimodal and Unimodal Interaction for Open Creative Design Tasks in Virtual Reality 终于平起平坐了?!虚拟现实中开放式创意设计任务的多模态和单模态交互
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418850
C. Zimmerer, Erik Wolf, Sara Wolf, Martin Fischbach, Jean-Luc Lugrin, Marc Erich Latoschik
Multimodal Interfaces (MMIs) have been considered to provide promising interaction paradigms for Virtual Reality (VR) for some time. However, they are still far less common than unimodal interfaces (UMIs). This paper presents a summative user study comparing an MMI to a typical UMI for a design task in VR. We developed an application targeting creative 3D object manipulations, i.e., creating 3D objects and modifying typical object properties such as color or size. The associated open user task is based on the Torrence Tests of Creative Thinking. We compared a synergistic multimodal interface using speech-accompanied pointing/grabbing gestures with a more typical unimodal interface using a hierarchical radial menu to trigger actions on selected objects. Independent judges rated the creativity of the resulting products using the Consensual Assessment Technique. Additionally, we measured the creativity-promoting factors flow, usability, and presence. Our results show that the MMI performs on par with the UMI in all measurements despite its limited flexibility and reliability. These promising results demonstrate the technological maturity of MMIs and their potential to extend traditional interaction techniques in VR efficiently.
一段时间以来,多模态接口(MMIs)被认为是虚拟现实(VR)中有前途的交互模式。然而,它们仍然远不如单模态接口(UMIs)常见。本文提出了一项总结性的用户研究,比较了虚拟现实设计任务中的MMI和典型UMI。我们开发了一个针对创造性3D对象操作的应用程序,即创建3D对象并修改典型对象属性(如颜色或大小)。相关的开放用户任务基于托伦斯创造性思维测试。我们比较了使用语音伴随的指向/抓取手势的协同多模态界面与使用分层径向菜单触发选定对象操作的更典型的单模态界面。独立评委使用共识评估技术对最终产品的创造力进行评级。此外,我们还测量了促进创造力的因素流、可用性和存在感。我们的研究结果表明,尽管MMI的灵活性和可靠性有限,但它在所有测量中都与UMI表现相当。这些有希望的结果表明了mmi技术的成熟及其在VR中有效扩展传统交互技术的潜力。
{"title":"Finally on Par?! Multimodal and Unimodal Interaction for Open Creative Design Tasks in Virtual Reality","authors":"C. Zimmerer, Erik Wolf, Sara Wolf, Martin Fischbach, Jean-Luc Lugrin, Marc Erich Latoschik","doi":"10.1145/3382507.3418850","DOIUrl":"https://doi.org/10.1145/3382507.3418850","url":null,"abstract":"Multimodal Interfaces (MMIs) have been considered to provide promising interaction paradigms for Virtual Reality (VR) for some time. However, they are still far less common than unimodal interfaces (UMIs). This paper presents a summative user study comparing an MMI to a typical UMI for a design task in VR. We developed an application targeting creative 3D object manipulations, i.e., creating 3D objects and modifying typical object properties such as color or size. The associated open user task is based on the Torrence Tests of Creative Thinking. We compared a synergistic multimodal interface using speech-accompanied pointing/grabbing gestures with a more typical unimodal interface using a hierarchical radial menu to trigger actions on selected objects. Independent judges rated the creativity of the resulting products using the Consensual Assessment Technique. Additionally, we measured the creativity-promoting factors flow, usability, and presence. Our results show that the MMI performs on par with the UMI in all measurements despite its limited flexibility and reliability. These promising results demonstrate the technological maturity of MMIs and their potential to extend traditional interaction techniques in VR efficiently.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121071096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Influence of Electric Taste, Smell, Color, and Thermal Sensory Modalities on the Liking and Mediated Emotions of Virtual Flavor Perception 电味觉、嗅觉、颜色和热感觉方式对虚拟味觉的喜爱和介导情绪的影响
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418862
Nimesha Ranasinghe, Meetha Nesam James, Michael Gecawicz, Jonathan Bland, David Smith
Little is known about the influence of various sensory modalities such as taste, smell, color, and thermal, towards perceiving simulated flavor sensations, let alone their influence on people's emotions and liking. Although flavor sensations are essential in our daily experiences and closely associated with our memories and emotions, the concept of flavor and the emotions caused by different sensory modalities are not thoroughly integrated into Virtual and Augmented Reality technologies. Hence, this paper presents 1) an interactive technology to simulate different flavor sensations by overlaying taste (via electrical stimulation on the tongue), smell (via micro air pumps), color (via RGB Lights), and thermal (via Peltier elements) sensations on plain water, and 2) a set of experiments to investigate a) the influence of different sensory modalities on the perception and liking of virtual flavors and b) varying emotions mediated through virtual flavor sensations. Our findings reveal that the participants perceived and liked various stimuli configurations and mostly associated them with positive emotions while highlighting important avenues for future research.
人们对各种感官模式(如味觉、嗅觉、颜色和热)对感知模拟风味感觉的影响知之甚少,更不用说它们对人们情绪和喜好的影响了。尽管味觉在我们的日常体验中是必不可少的,并且与我们的记忆和情感密切相关,但味觉的概念以及由不同感官模式引起的情感并没有完全融入虚拟现实和增强现实技术。因此,本文提出了1)一种交互式技术,通过在平水上叠加味觉(通过电刺激舌头)、嗅觉(通过微气泵)、颜色(通过RGB灯)和热(通过Peltier元素)感觉来模拟不同的风味感觉;2)一组实验来研究a)不同感官模式对虚拟风味的感知和喜爱的影响;b)通过虚拟风味感觉介导的不同情绪。我们的研究结果表明,参与者感知并喜欢各种刺激配置,并且大多数将它们与积极情绪联系起来,同时强调了未来研究的重要途径。
{"title":"Influence of Electric Taste, Smell, Color, and Thermal Sensory Modalities on the Liking and Mediated Emotions of Virtual Flavor Perception","authors":"Nimesha Ranasinghe, Meetha Nesam James, Michael Gecawicz, Jonathan Bland, David Smith","doi":"10.1145/3382507.3418862","DOIUrl":"https://doi.org/10.1145/3382507.3418862","url":null,"abstract":"Little is known about the influence of various sensory modalities such as taste, smell, color, and thermal, towards perceiving simulated flavor sensations, let alone their influence on people's emotions and liking. Although flavor sensations are essential in our daily experiences and closely associated with our memories and emotions, the concept of flavor and the emotions caused by different sensory modalities are not thoroughly integrated into Virtual and Augmented Reality technologies. Hence, this paper presents 1) an interactive technology to simulate different flavor sensations by overlaying taste (via electrical stimulation on the tongue), smell (via micro air pumps), color (via RGB Lights), and thermal (via Peltier elements) sensations on plain water, and 2) a set of experiments to investigate a) the influence of different sensory modalities on the perception and liking of virtual flavors and b) varying emotions mediated through virtual flavor sensations. Our findings reveal that the participants perceived and liked various stimuli configurations and mostly associated them with positive emotions while highlighting important avenues for future research.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128567423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Fusical: Multimodal Fusion for Video Sentiment Fusical:视频情感的多模态融合
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3417966
Bo Jin, L. Abdelrahman, C. Chen, Amil Khanzada
Determining the emotional sentiment of a video remains a challenging task that requires multimodal, contextual understanding of a situation. In this paper, we describe our entry into the EmotiW 2020 Audio-Video Group Emotion Recognition Challenge to classify group videos containing large variations in language, people, and environment, into one of three sentiment classes. Our end-to-end approach consists of independently training models for different modalities, including full-frame video scenes, human body keypoints, embeddings extracted from audio clips, and image-caption word embeddings. Novel combinations of modalities, such as laughter and image-captioning, and transfer learning are further developed. We use fully-connected (FC) fusion ensembling to aggregate the modalities, achieving a best test accuracy of 63.9% which is 16 percentage points higher than that of the baseline ensemble.
确定视频的情感情绪仍然是一项具有挑战性的任务,需要对情况进行多模式和上下文理解。在本文中,我们描述了我们进入EmotiW 2020音频-视频群体情感识别挑战的过程,将包含语言、人物和环境大变化的群体视频分类为三种情感类之一。我们的端到端方法由不同模态的独立训练模型组成,包括全帧视频场景、人体关键点、从音频片段中提取的嵌入和图像标题词嵌入。新的模式组合,如笑声和图像字幕,以及迁移学习得到进一步发展。我们使用全连接(FC)融合集成来聚合模式,达到了63.9%的最佳测试精度,比基线集成高16个百分点。
{"title":"Fusical: Multimodal Fusion for Video Sentiment","authors":"Bo Jin, L. Abdelrahman, C. Chen, Amil Khanzada","doi":"10.1145/3382507.3417966","DOIUrl":"https://doi.org/10.1145/3382507.3417966","url":null,"abstract":"Determining the emotional sentiment of a video remains a challenging task that requires multimodal, contextual understanding of a situation. In this paper, we describe our entry into the EmotiW 2020 Audio-Video Group Emotion Recognition Challenge to classify group videos containing large variations in language, people, and environment, into one of three sentiment classes. Our end-to-end approach consists of independently training models for different modalities, including full-frame video scenes, human body keypoints, embeddings extracted from audio clips, and image-caption word embeddings. Novel combinations of modalities, such as laughter and image-captioning, and transfer learning are further developed. We use fully-connected (FC) fusion ensembling to aggregate the modalities, achieving a best test accuracy of 63.9% which is 16 percentage points higher than that of the baseline ensemble.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128703912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Supporting Instructors to Provide Emotional and Instructional Scaffolding for English Language Learners through Biosensor-based Feedback 支持教师通过基于生物传感器的反馈为英语学习者提供情感和教学脚手架
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3421159
Heera Lee
Delivering a presentation has been reported as one of the most anxiety-provoking tasks faced by English Language Learners. Researchers suggest that instructors should be more aware of the learners' emotional states to provide appropriate emotional and instructional scaffolding to improve their performance when presenting. Despite the critical role of instructors in perceiving the emotional states among English language learners, it can be challenging to do this solely by observing the learners? facial expressions, behaviors, and their limited verbal expressions due to language and cultural barriers. To address the ambiguity and inconsistency in interpreting the emotional states of the students, this research focuses on identifying the potential of using biosensor-based feedback of learners to support instructors. A novel approach has been adopted to classify the intensity and characteristics of public speaking anxiety and foreign language anxiety among English language learners and to provide tailored feedback to instructors while supporting teaching and learning. As part of this work, two further studies were proposed. The first study was designed to identify educators' needs for solutions providing emotional and instructional support. The second study aims to evaluate a resulting prototype as a view of instructors to offer tailored emotional and instructional scaffolding to students. The contribution of these studies includes the development of guidance in using biosensor-based feedback that will assist English language instructors in teaching and identifying the students' anxiety levels and types while delivering a presentation.
据报道,演讲是英语学习者面临的最令人焦虑的任务之一。研究人员建议,教师应该更多地了解学习者的情绪状态,提供适当的情绪和教学支架,以提高他们在演讲时的表现。尽管教师在感知英语学习者的情绪状态方面起着至关重要的作用,但仅仅通过观察学习者来做到这一点可能是具有挑战性的。由于语言和文化障碍,面部表情、行为和有限的言语表达。为了解决对学生情绪状态的模糊和不一致的解释,本研究侧重于确定使用学习者基于生物传感器的反馈来支持教师的潜力。本文采用一种新颖的方法对英语学习者的公共演讲焦虑和外语焦虑的强度和特征进行分类,并在支持教与学的同时为教师提供有针对性的反馈。作为这项工作的一部分,提出了两项进一步的研究。第一项研究旨在确定教育工作者对提供情感和教学支持的解决方案的需求。第二项研究旨在评估结果原型,作为教师的观点,为学生提供量身定制的情感和教学脚手架。这些研究的贡献包括开发使用基于生物传感器的反馈的指导,这将有助于英语语言教师在教学和确定学生在演讲时的焦虑水平和类型。
{"title":"Supporting Instructors to Provide Emotional and Instructional Scaffolding for English Language Learners through Biosensor-based Feedback","authors":"Heera Lee","doi":"10.1145/3382507.3421159","DOIUrl":"https://doi.org/10.1145/3382507.3421159","url":null,"abstract":"Delivering a presentation has been reported as one of the most anxiety-provoking tasks faced by English Language Learners. Researchers suggest that instructors should be more aware of the learners' emotional states to provide appropriate emotional and instructional scaffolding to improve their performance when presenting. Despite the critical role of instructors in perceiving the emotional states among English language learners, it can be challenging to do this solely by observing the learners? facial expressions, behaviors, and their limited verbal expressions due to language and cultural barriers. To address the ambiguity and inconsistency in interpreting the emotional states of the students, this research focuses on identifying the potential of using biosensor-based feedback of learners to support instructors. A novel approach has been adopted to classify the intensity and characteristics of public speaking anxiety and foreign language anxiety among English language learners and to provide tailored feedback to instructors while supporting teaching and learning. As part of this work, two further studies were proposed. The first study was designed to identify educators' needs for solutions providing emotional and instructional support. The second study aims to evaluate a resulting prototype as a view of instructors to offer tailored emotional and instructional scaffolding to students. The contribution of these studies includes the development of guidance in using biosensor-based feedback that will assist English language instructors in teaching and identifying the students' anxiety levels and types while delivering a presentation.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126220980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Touch Recognition with Attentive End-to-End Model 基于细心端到端模型的触摸识别
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418834
Wail El Bani, M. Chetouani
Touch is the earliest sense to develop and the first mean of contact with the external world. Touch also plays a key role in our socio-emotional communication: we use it to communicate our feelings, elicit strong emotions in others and modulate behavior (e.g compliance). Although its relevance, touch is an understudied modality in Human-Machine-Interaction compared to audition and vision. Most of the social touch recognition systems require a feature engineering step making them difficult to compare and to generalize to other databases. In this paper, we propose an end-to-end approach. We present an attention-based end-to-end model for touch gesture recognition evaluated on two public datasets (CoST and HAART) in the context of the ICMI 15 Social Touch Challenge. Our model gave a similar level of accuracy: 61% for CoST and 68% for HAART and uses self-attention as an alternative to feature engineering and Recurrent Neural Networks.
触觉是人类最早发展起来的感官,也是与外界接触的第一种方式。触摸在我们的社会情感交流中也起着关键作用:我们用它来交流我们的感受,引发他人的强烈情绪,调节行为(如顺从)。虽然与听觉和视觉相关,但在人机交互中,触觉是一种未被充分研究的方式。大多数社交触摸识别系统都需要一个特征工程步骤,这使得它们难以比较和推广到其他数据库。在本文中,我们提出了一种端到端方法。在ICMI 15社交触摸挑战的背景下,我们提出了一个基于注意力的端到端触摸手势识别模型,该模型在两个公共数据集(CoST和HAART)上进行了评估。我们的模型给出了类似的精度水平:成本为61%,HAART为68%,并使用自关注作为特征工程和循环神经网络的替代方案。
{"title":"Touch Recognition with Attentive End-to-End Model","authors":"Wail El Bani, M. Chetouani","doi":"10.1145/3382507.3418834","DOIUrl":"https://doi.org/10.1145/3382507.3418834","url":null,"abstract":"Touch is the earliest sense to develop and the first mean of contact with the external world. Touch also plays a key role in our socio-emotional communication: we use it to communicate our feelings, elicit strong emotions in others and modulate behavior (e.g compliance). Although its relevance, touch is an understudied modality in Human-Machine-Interaction compared to audition and vision. Most of the social touch recognition systems require a feature engineering step making them difficult to compare and to generalize to other databases. In this paper, we propose an end-to-end approach. We present an attention-based end-to-end model for touch gesture recognition evaluated on two public datasets (CoST and HAART) in the context of the ICMI 15 Social Touch Challenge. Our model gave a similar level of accuracy: 61% for CoST and 68% for HAART and uses self-attention as an alternative to feature engineering and Recurrent Neural Networks.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114707871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LDNN: Linguistic Knowledge Injectable Deep Neural Network for Group Cohesiveness Understanding 面向群体内聚性理解的语言知识注入深度神经网络
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418830
Yanan Wang, Jianming Wu, Jinfa Huang, Gen Hattori, Y. Takishima, Shinya Wada, Rui Kimura, Jie Chen, Satoshi Kurihara
Group cohesiveness reflects the level of intimacy that people feel with each other, and the development of a dialogue robot that can understand group cohesiveness will lead to the promotion of human communication. However, group cohesiveness is a complex concept that is difficult to predict based only on image pixels. Inspired by the fact that humans intuitively associate linguistic knowledge accumulated in the brain with the visual images they see, we propose a linguistic knowledge injectable deep neural network (LDNN) that builds a visual model (visual LDNN) for predicting group cohesiveness that can automatically associate the linguistic knowledge hidden behind images. LDNN consists of a visual encoder and a language encoder, and applies domain adaptation and linguistic knowledge transition mechanisms to transform linguistic knowledge from a language model to the visual LDNN. We train LDNN by adding descriptions to the training and validation sets of the Group AFfect Dataset 3.0 (GAF 3.0), and test the visual LDNN without any description. Comparing visual LDNN with various fine-tuned DNN models and three state-of-the-art models in the test set, the results demonstrate that the visual LDNN not only improves the performance of the fine-tuned DNN model leading to an MSE very similar to the state-of-the-art model, but is also a practical and efficient method that requires relatively little preprocessing. Furthermore, ablation studies confirm that LDNN is an effective method to inject linguistic knowledge into visual models.
群体凝聚力反映了人们彼此之间的亲密程度,开发出能够理解群体凝聚力的对话机器人将会促进人类的交流。然而,群体凝聚力是一个复杂的概念,很难仅根据图像像素来预测。受人类直观地将大脑中积累的语言知识与所看到的视觉图像相关联这一事实的启发,我们提出了一种语言知识注入深度神经网络(LDNN),该网络构建了一个视觉模型(视觉LDNN),用于预测群体凝聚力,该模型可以自动关联隐藏在图像背后的语言知识。LDNN由视觉编码器和语言编码器组成,利用领域自适应和语言知识转换机制将语言知识从语言模型转化为视觉LDNN。我们通过在Group AFfect Dataset 3.0 (GAF 3.0)的训练集和验证集上添加描述来训练LDNN,并在没有任何描述的情况下测试视觉LDNN。将视觉LDNN与各种微调DNN模型和测试集中的三种最先进的DNN模型进行比较,结果表明,视觉LDNN不仅提高了微调DNN模型的性能,得到了与最先进模型非常相似的MSE,而且是一种实用有效的方法,需要相对较少的预处理。此外,消融研究证实了LDNN是一种将语言知识注入视觉模型的有效方法。
{"title":"LDNN: Linguistic Knowledge Injectable Deep Neural Network for Group Cohesiveness Understanding","authors":"Yanan Wang, Jianming Wu, Jinfa Huang, Gen Hattori, Y. Takishima, Shinya Wada, Rui Kimura, Jie Chen, Satoshi Kurihara","doi":"10.1145/3382507.3418830","DOIUrl":"https://doi.org/10.1145/3382507.3418830","url":null,"abstract":"Group cohesiveness reflects the level of intimacy that people feel with each other, and the development of a dialogue robot that can understand group cohesiveness will lead to the promotion of human communication. However, group cohesiveness is a complex concept that is difficult to predict based only on image pixels. Inspired by the fact that humans intuitively associate linguistic knowledge accumulated in the brain with the visual images they see, we propose a linguistic knowledge injectable deep neural network (LDNN) that builds a visual model (visual LDNN) for predicting group cohesiveness that can automatically associate the linguistic knowledge hidden behind images. LDNN consists of a visual encoder and a language encoder, and applies domain adaptation and linguistic knowledge transition mechanisms to transform linguistic knowledge from a language model to the visual LDNN. We train LDNN by adding descriptions to the training and validation sets of the Group AFfect Dataset 3.0 (GAF 3.0), and test the visual LDNN without any description. Comparing visual LDNN with various fine-tuned DNN models and three state-of-the-art models in the test set, the results demonstrate that the visual LDNN not only improves the performance of the fine-tuned DNN model leading to an MSE very similar to the state-of-the-art model, but is also a practical and efficient method that requires relatively little preprocessing. Furthermore, ablation studies confirm that LDNN is an effective method to inject linguistic knowledge into visual models.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127963772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Multi-modal Fusion Using Spatio-temporal and Static Features for Group Emotion Recognition 基于时空和静态特征的多模态融合群体情绪识别
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3417971
Mo Sun, Jian Li, Hui Feng, Wei Gou, Haifeng Shen, Jian-Bo Tang, Yi Yang, Jieping Ye
This paper presents our approach for Audio-video Group Emotion Recognition sub-challenge in the EmotiW 2020. The task is to classify a video into one of the group emotions such as positive, neutral, and negative. Our approach exploits two different feature levels for this task, spatio-temporal feature and static feature level. In spatio-temporal feature level, we adopt multiple input modalities (RGB, RGB difference, optical flow, warped optical flow) into multiple video classification network to train the spatio-temporal model. In static feature level, we crop all faces and bodies in an image with the state-of the-art human pose estimation method and train kinds of CNNs with the image-level labels of group emotions. Finally, we fuse all 14 models result together, and achieve the third place in this sub-challenge with classification accuracies of 71.93% and 70.77% on the validation set and test set, respectively.
本文介绍了我们在EmotiW 2020中音频-视频组情感识别子挑战的方法。任务是将视频分类为一组情绪,如积极,中性和消极。我们的方法利用了两个不同的特征级别,时空特征和静态特征级别。在时空特征层面,我们将RGB、RGB差分、光流、扭曲光流等多种输入方式引入到多视频分类网络中,对时空模型进行训练。在静态特征层面,我们使用最先进的人体姿态估计方法裁剪图像中的所有面部和身体,并使用图像级别的群体情绪标签训练各种cnn。最后,我们将所有14个模型的结果融合在一起,在验证集和测试集上分别以71.93%和70.77%的分类准确率获得了该子挑战的第三名。
{"title":"Multi-modal Fusion Using Spatio-temporal and Static Features for Group Emotion Recognition","authors":"Mo Sun, Jian Li, Hui Feng, Wei Gou, Haifeng Shen, Jian-Bo Tang, Yi Yang, Jieping Ye","doi":"10.1145/3382507.3417971","DOIUrl":"https://doi.org/10.1145/3382507.3417971","url":null,"abstract":"This paper presents our approach for Audio-video Group Emotion Recognition sub-challenge in the EmotiW 2020. The task is to classify a video into one of the group emotions such as positive, neutral, and negative. Our approach exploits two different feature levels for this task, spatio-temporal feature and static feature level. In spatio-temporal feature level, we adopt multiple input modalities (RGB, RGB difference, optical flow, warped optical flow) into multiple video classification network to train the spatio-temporal model. In static feature level, we crop all faces and bodies in an image with the state-of the-art human pose estimation method and train kinds of CNNs with the image-level labels of group emotions. Finally, we fuse all 14 models result together, and achieve the third place in this sub-challenge with classification accuracies of 71.93% and 70.77% on the validation set and test set, respectively.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"195 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134331834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
The AI-Medic: A Multimodal Artificial Intelligent Mentor for Trauma Surgery AI-Medic:创伤外科的多模式人工智能导师
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3421167
Edgar Rojas-Muñoz, K. Couperus, J. Wachs
Telementoring generalist surgeons as they treat patients can be essential when in situ expertise is not readily available. However, adverse cyber-attacks, unreliable network conditions, and remote mentors' predisposition can significantly jeopardize the remote intervention. To provide medical practitioners with guidance when mentors are unavailable, we present the AI-Medic, the initial steps towards the development of a multimodal intelligent artificial system for autonomous medical mentoring. The system uses a tablet device to acquire the view of an operating field. This imagery is provided to an encoder-decoder neural network trained to predict medical instructions from the current view of a surgery. The network was training using DAISI, a dataset including images and instructions providing step-by-step demonstrations of surgical procedures. The predicted medical instructions are conveyed to the user via visual and auditory modalities.
当现场专业知识不容易获得时,远程监控通才外科医生在治疗患者时是必不可少的。然而,不利的网络攻击、不可靠的网络条件和远程导师的易感会极大地危害远程干预。为了在没有导师的情况下为医疗从业者提供指导,我们提出了AI-Medic,这是开发用于自主医疗指导的多模式智能人工系统的第一步。该系统使用平板设备来获取操作场的视图。该图像被提供给经过训练的编码器-解码器神经网络,以从手术的当前视图预测医疗指示。该网络使用DAISI进行训练,DAISI是一个数据集,包括图像和提供外科手术步骤演示的说明。预测的医疗指示通过视觉和听觉方式传达给用户。
{"title":"The AI-Medic: A Multimodal Artificial Intelligent Mentor for Trauma Surgery","authors":"Edgar Rojas-Muñoz, K. Couperus, J. Wachs","doi":"10.1145/3382507.3421167","DOIUrl":"https://doi.org/10.1145/3382507.3421167","url":null,"abstract":"Telementoring generalist surgeons as they treat patients can be essential when in situ expertise is not readily available. However, adverse cyber-attacks, unreliable network conditions, and remote mentors' predisposition can significantly jeopardize the remote intervention. To provide medical practitioners with guidance when mentors are unavailable, we present the AI-Medic, the initial steps towards the development of a multimodal intelligent artificial system for autonomous medical mentoring. The system uses a tablet device to acquire the view of an operating field. This imagery is provided to an encoder-decoder neural network trained to predict medical instructions from the current view of a surgery. The network was training using DAISI, a dataset including images and instructions providing step-by-step demonstrations of surgical procedures. The predicted medical instructions are conveyed to the user via visual and auditory modalities.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130087006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hand-eye Coordination for Textual Difficulty Detection in Text Summarization 文本摘要中文本难度检测的手眼协调
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418831
Jun Wang, G. Ngai, H. Leong
The task of summarizing a document is a complex task that requires a person to multitask between reading and writing processes. Since a person's cognitive load during reading or writing is known to be dependent upon the level of comprehension or difficulty of the article, this suggests that it should be possible to analyze the cognitive process of the user when carrying out the task, as evidenced through their eye gaze and typing features, to obtain an insight into the different difficulty levels. In this paper, we categorize the summary writing process into different phases and extract different gaze and typing features from each phase according to characteristics of eye-gaze behaviors and typing dynamics. Combining these multimodal features, we build a classifier that achieves an accuracy of 91.0% for difficulty level detection, which is around 55% performance improvement above the baseline and at least 15% improvement above models built on a single modality. We also investigate the possible reasons for the superior performance of our multimodal features.
总结文档是一项复杂的任务,需要一个人在阅读和写作过程之间进行多任务处理。由于人们在阅读或写作时的认知负荷已知取决于文章的理解程度或难度,这表明应该有可能分析用户在执行任务时的认知过程,通过他们的眼睛注视和打字特征来证明,以了解不同的难度水平。本文将摘要写作过程划分为不同的阶段,并根据人眼注视行为和打字动态的特点,从每个阶段提取不同的注视和打字特征。结合这些多模态特征,我们构建了一个分类器,在难度级别检测方面达到了91.0%的准确率,比基线提高了大约55%的性能,比基于单一模态的模型提高了至少15%。我们还研究了我们的多模态特征具有优越性能的可能原因。
{"title":"Hand-eye Coordination for Textual Difficulty Detection in Text Summarization","authors":"Jun Wang, G. Ngai, H. Leong","doi":"10.1145/3382507.3418831","DOIUrl":"https://doi.org/10.1145/3382507.3418831","url":null,"abstract":"The task of summarizing a document is a complex task that requires a person to multitask between reading and writing processes. Since a person's cognitive load during reading or writing is known to be dependent upon the level of comprehension or difficulty of the article, this suggests that it should be possible to analyze the cognitive process of the user when carrying out the task, as evidenced through their eye gaze and typing features, to obtain an insight into the different difficulty levels. In this paper, we categorize the summary writing process into different phases and extract different gaze and typing features from each phase according to characteristics of eye-gaze behaviors and typing dynamics. Combining these multimodal features, we build a classifier that achieves an accuracy of 91.0% for difficulty level detection, which is around 55% performance improvement above the baseline and at least 15% improvement above models built on a single modality. We also investigate the possible reasons for the superior performance of our multimodal features.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130308151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal Interaction in Psychopathology 精神病理学中的多模态相互作用
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3419751
Itir Onal Ertugrul, J. Cohn, Hamdi Dibeklioğlu
This paper presents an introduction to the Multimodal Interaction in Psychopathology workshop, which is held virtually in conjunction with the 22nd ACM International Conference on Multimodal Interaction on October 25th, 2020. This workshop has attracted submissions in the context of investigating multimodal interaction to reveal mechanisms and assess, monitor, and treat psychopathology. Keynote speakers from diverse disciplines present an overview of the field from different vantages and comment on future directions. Here we summarize the goals and the content of the workshop.
本文介绍了精神病理学中的多模态相互作用研讨会,该研讨会将于2020年10月25日与第22届ACM多模态相互作用国际会议同时举行。本次研讨会吸引了在调查多模态相互作用以揭示机制和评估、监测和治疗精神病理的背景下提交的材料。来自不同学科的主讲人从不同的优势对该领域进行了概述,并对未来的发展方向进行了评论。我们在此总结研讨会的目标和内容。
{"title":"Multimodal Interaction in Psychopathology","authors":"Itir Onal Ertugrul, J. Cohn, Hamdi Dibeklioğlu","doi":"10.1145/3382507.3419751","DOIUrl":"https://doi.org/10.1145/3382507.3419751","url":null,"abstract":"This paper presents an introduction to the Multimodal Interaction in Psychopathology workshop, which is held virtually in conjunction with the 22nd ACM International Conference on Multimodal Interaction on October 25th, 2020. This workshop has attracted submissions in the context of investigating multimodal interaction to reveal mechanisms and assess, monitor, and treat psychopathology. Keynote speakers from diverse disciplines present an overview of the field from different vantages and comment on future directions. Here we summarize the goals and the content of the workshop.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126637592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the 2020 International Conference on Multimodal Interaction
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1