首页 > 最新文献

Companion Publication of the 2020 International Conference on Multimodal Interaction最新文献

英文 中文
Deciphering Entrepreneurial Pitches: A Multimodal Deep Learning Approach to Predict Probability of Investment 解读创业推销:预测投资概率的多模态深度学习方法
Pepijn Van Aken, Merel M. Jung, Werner Liebregts, Itir Onal Ertugrul
Acquiring early-stage investments for the purpose of developing a business is a fundamental aspect of the entrepreneurial process, which regularly entails pitching the business proposal to potential investors. Previous research suggests that business viability data and the perception of the entrepreneur play an important role in the investment decision-making process. This perception of the entrepreneur is shaped by verbal and non-verbal behavioral cues produced in investor-entrepreneur interactions. This study explores the impact of such cues on decisions that involve investing in a startup on the basis of a pitch. A multimodal approach is developed in which acoustic and linguistic features are extracted from recordings of entrepreneurial pitches to predict the likelihood of investment. The acoustic and linguistic modalities are represented using both hand-crafted and deep features. The capabilities of deep learning models are exploited to capture the temporal dynamics of the inputs. The findings show promising results for the prediction of the likelihood of investment using a multimodal architecture consisting of acoustic and linguistic features. Models based on deep features generally outperform hand-crafted representations. Experiments with an explainable model provide insights about the important features. The most predictive model is found to be a multimodal one that combines deep acoustic and linguistic features using an early fusion strategy and achieves an MAE of 13.91.
为了发展业务而获得早期投资是创业过程的一个基本方面,这通常需要向潜在投资者推销商业提案。以往的研究表明,企业生存能力数据和企业家的感知在投资决策过程中起着重要作用。这种对企业家的看法是由投资者与企业家互动中产生的语言和非语言行为线索形成的。这项研究探讨了这些线索对投资创业公司决策的影响。开发了一种多模式方法,其中从创业宣传的录音中提取声学和语言特征来预测投资的可能性。声学和语言模式是用手工和深层特征来表示的。利用深度学习模型的能力来捕捉输入的时间动态。研究结果显示,使用由声学和语言特征组成的多模态架构预测投资可能性的结果很有希望。基于深度特征的模型通常优于手工制作的表示。用一个可解释的模型进行的实验提供了关于重要特征的见解。最具预测性的模型是使用早期融合策略结合深层声学和语言特征的多模态模型,MAE为13.91。
{"title":"Deciphering Entrepreneurial Pitches: A Multimodal Deep Learning Approach to Predict Probability of Investment","authors":"Pepijn Van Aken, Merel M. Jung, Werner Liebregts, Itir Onal Ertugrul","doi":"10.1145/3577190.3614146","DOIUrl":"https://doi.org/10.1145/3577190.3614146","url":null,"abstract":"Acquiring early-stage investments for the purpose of developing a business is a fundamental aspect of the entrepreneurial process, which regularly entails pitching the business proposal to potential investors. Previous research suggests that business viability data and the perception of the entrepreneur play an important role in the investment decision-making process. This perception of the entrepreneur is shaped by verbal and non-verbal behavioral cues produced in investor-entrepreneur interactions. This study explores the impact of such cues on decisions that involve investing in a startup on the basis of a pitch. A multimodal approach is developed in which acoustic and linguistic features are extracted from recordings of entrepreneurial pitches to predict the likelihood of investment. The acoustic and linguistic modalities are represented using both hand-crafted and deep features. The capabilities of deep learning models are exploited to capture the temporal dynamics of the inputs. The findings show promising results for the prediction of the likelihood of investment using a multimodal architecture consisting of acoustic and linguistic features. Models based on deep features generally outperform hand-crafted representations. Experiments with an explainable model provide insights about the important features. The most predictive model is found to be a multimodal one that combines deep acoustic and linguistic features using an early fusion strategy and achieves an MAE of 13.91.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135043299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
ViFi-Loc: Multi-modal Pedestrian Localization using GAN with Camera-Phone Correspondences ViFi-Loc:使用GAN与相机-手机通信的多模态行人定位
Hansi Liu, Hongsheng Lu, Kristin Data, Marco Gruteser
In Smart City and Vehicle-to-Everything (V2X) systems, acquiring pedestrians’ accurate locations is crucial to traffic and pedestrian safety. Current systems adopt cameras and wireless sensors to estimate people’s locations via sensor fusion. Standard fusion algorithms, however, become inapplicable when multi-modal data is not associated. For example, pedestrians are out of the camera field of view, or data from the camera modality is missing. To address this challenge and produce more accurate location estimations for pedestrians, we propose a localization solution based on a Generative Adversarial Network (GAN) architecture. During training, it learns the underlying linkage between pedestrians’ camera-phone data correspondences. During inference, it generates refined position estimations based only on pedestrians’ phone data that consists of GPS, IMU, and FTM. Results show that our GAN produces 3D coordinates at 1 to 2 meters localization error across 5 different outdoor scenes. We further show that the proposed model supports self-learning. The generated coordinates can be associated with pedestrians’ bounding box coordinates to obtain additional camera-phone data correspondences. This allows automatic data collection during inference. Results show that after fine-tuning the GAN model on the expanded dataset, localization accuracy is further improved by up to 26%.
在智慧城市和车联网(V2X)系统中,获取行人的准确位置对交通和行人安全至关重要。目前的系统采用摄像头和无线传感器,通过传感器融合来估计人们的位置。然而,当多模态数据不相关联时,标准的融合算法就不适用了。例如,行人在相机视野之外,或者来自相机模态的数据丢失。为了应对这一挑战并为行人提供更准确的位置估计,我们提出了一种基于生成对抗网络(GAN)架构的定位解决方案。在训练过程中,它学习行人的相机和手机数据通信之间的潜在联系。在推理过程中,它仅基于行人的手机数据生成精细的位置估计,该数据由GPS, IMU和FTM组成。结果表明,我们的GAN在5个不同的室外场景中产生的3D坐标定位误差为1到2米。我们进一步证明了所提出的模型支持自学习。生成的坐标可以与行人的边界框坐标相关联,以获得额外的相机-手机数据对应。这允许在推理期间自动收集数据。结果表明,在扩展数据集上对GAN模型进行微调后,定位精度进一步提高了26%。
{"title":"ViFi-Loc: Multi-modal Pedestrian Localization using GAN with Camera-Phone Correspondences","authors":"Hansi Liu, Hongsheng Lu, Kristin Data, Marco Gruteser","doi":"10.1145/3577190.3614119","DOIUrl":"https://doi.org/10.1145/3577190.3614119","url":null,"abstract":"In Smart City and Vehicle-to-Everything (V2X) systems, acquiring pedestrians’ accurate locations is crucial to traffic and pedestrian safety. Current systems adopt cameras and wireless sensors to estimate people’s locations via sensor fusion. Standard fusion algorithms, however, become inapplicable when multi-modal data is not associated. For example, pedestrians are out of the camera field of view, or data from the camera modality is missing. To address this challenge and produce more accurate location estimations for pedestrians, we propose a localization solution based on a Generative Adversarial Network (GAN) architecture. During training, it learns the underlying linkage between pedestrians’ camera-phone data correspondences. During inference, it generates refined position estimations based only on pedestrians’ phone data that consists of GPS, IMU, and FTM. Results show that our GAN produces 3D coordinates at 1 to 2 meters localization error across 5 different outdoor scenes. We further show that the proposed model supports self-learning. The generated coordinates can be associated with pedestrians’ bounding box coordinates to obtain additional camera-phone data correspondences. This allows automatic data collection during inference. Results show that after fine-tuning the GAN model on the expanded dataset, localization accuracy is further improved by up to 26%.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-Device Shortcuts: Seamless Attention-guided Content Transfer via Opportunistic Deep Links between Apps and Devices 跨设备快捷方式:通过应用程序和设备之间的机会深度链接无缝注意力引导的内容传输
Marilou Beyeler, Yi Fei Cheng, Christian Holz
Although users increasingly spread their activities across multiple devices—even to accomplish a single task—information transfer between apps on separate devices still incurs non-negligible effort and time overhead. These interaction flows would considerably benefit from more seamless cross-device interaction that directly connects the information flow between the involved apps across devices. In this paper, we propose cross-device shortcuts, an interaction technique that enables direct and discoverable content exchange between apps on different devices. When users switch their attention between multiple engaged devices as part of a workflow, our system establishes a cross-device shortcut—a deep link between apps on separate devices that presents itself through feed-forward previews, inviting and facilitating quick content transfer. We explore the use of this technique in four scenarios spanning multiple devices and applications, and highlight the potential, limitations, and challenges of its design with a preliminary evaluation.
尽管用户越来越多地将他们的活动分散到多个设备上——即使是为了完成一个任务——但不同设备上的应用程序之间的信息传输仍然会带来不可忽视的工作量和时间开销。这些交互流将大大受益于更无缝的跨设备交互,直接连接跨设备的相关应用程序之间的信息流。在本文中,我们提出了跨设备快捷方式,这是一种交互技术,可以在不同设备上的应用程序之间进行直接和可发现的内容交换。当用户在多个参与的设备之间切换他们的注意力作为工作流程的一部分时,我们的系统建立了一个跨设备的快捷方式——在不同设备上的应用程序之间的深度链接,通过前馈预览呈现,邀请和促进快速的内容转移。我们探索了该技术在四种场景中的使用,涵盖多种设备和应用,并通过初步评估强调了其设计的潜力、局限性和挑战。
{"title":"Cross-Device Shortcuts: Seamless Attention-guided Content Transfer via Opportunistic Deep Links between Apps and Devices","authors":"Marilou Beyeler, Yi Fei Cheng, Christian Holz","doi":"10.1145/3577190.3614145","DOIUrl":"https://doi.org/10.1145/3577190.3614145","url":null,"abstract":"Although users increasingly spread their activities across multiple devices—even to accomplish a single task—information transfer between apps on separate devices still incurs non-negligible effort and time overhead. These interaction flows would considerably benefit from more seamless cross-device interaction that directly connects the information flow between the involved apps across devices. In this paper, we propose cross-device shortcuts, an interaction technique that enables direct and discoverable content exchange between apps on different devices. When users switch their attention between multiple engaged devices as part of a workflow, our system establishes a cross-device shortcut—a deep link between apps on separate devices that presents itself through feed-forward previews, inviting and facilitating quick content transfer. We explore the use of this technique in four scenarios spanning multiple devices and applications, and highlight the potential, limitations, and challenges of its design with a preliminary evaluation.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Increasing Heart Rate and Anxiety Level with Vibrotactile and Audio Presentation of Fast Heartbeat 快速心跳的振动触觉和音频表现增加心率和焦虑水平
Ruoqi Wang, Haifeng Zhang, Shaun Alexander Macdonald, Patrizia Di Campli San Vito
Heartbeat is not only one of our physical health indicators, but also plays an important role in our emotional changes. Previous investigations have been repeatedly investigated to the soothing effects of low frequency vibrotactile cues which evoke a slow heartbeat in stressful situations. The impact of stimuli which evoke faster heartbeats on users’ anxiety or heart rate is, however, poorly understood. We conducted two studies to evaluate the influence of the presentation of a fast heartbeat via vibration and/or sound, both in calm and stressed states. Results showed that the presentation of fast heartbeat stimuli can induce increased anxiety levels and heart rate. We use these results to inform how future designers could carefully present fast heartbeat stimuli in multimedia application to enhance feelings of immersion, effort and engagement.
心跳不仅是我们身体健康的指标之一,而且对我们的情绪变化也起着重要的作用。先前的研究已经反复研究了低频振动触觉线索的舒缓效果,它能在紧张的情况下引起缓慢的心跳。然而,引起心跳加快的刺激对用户焦虑或心率的影响尚不清楚。我们进行了两项研究,以评估在平静和紧张状态下通过振动和/或声音呈现的快速心跳的影响。结果表明,快速心跳刺激的出现可以引起焦虑水平和心率的增加。我们利用这些结果来告知未来的设计师如何在多媒体应用中精心呈现快速心跳刺激,以增强沉浸感、努力感和参与感。
{"title":"Increasing Heart Rate and Anxiety Level with Vibrotactile and Audio Presentation of Fast Heartbeat","authors":"Ruoqi Wang, Haifeng Zhang, Shaun Alexander Macdonald, Patrizia Di Campli San Vito","doi":"10.1145/3577190.3614161","DOIUrl":"https://doi.org/10.1145/3577190.3614161","url":null,"abstract":"Heartbeat is not only one of our physical health indicators, but also plays an important role in our emotional changes. Previous investigations have been repeatedly investigated to the soothing effects of low frequency vibrotactile cues which evoke a slow heartbeat in stressful situations. The impact of stimuli which evoke faster heartbeats on users’ anxiety or heart rate is, however, poorly understood. We conducted two studies to evaluate the influence of the presentation of a fast heartbeat via vibration and/or sound, both in calm and stressed states. Results showed that the presentation of fast heartbeat stimuli can induce increased anxiety levels and heart rate. We use these results to inform how future designers could carefully present fast heartbeat stimuli in multimedia application to enhance feelings of immersion, effort and engagement.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GCFormer: A Graph Convolutional Transformer for Speech Emotion Recognition GCFormer:一种用于语音情感识别的图形卷积转换器
Yingxue Gao, Huan Zhao, Yufeng Xiao, Zixing Zhang
Graph convolutional networks (GCNs) have achieved excellent results in image classification and natural language processing. However, at present, the application of GCNs in speech emotion recognition (SER) is not widely studied. Meanwhile, recent studies have shown that GCNs may not be able to adaptively capture the long-range context emotional information over the whole audio. To alleviate this problem, this paper proposes a Graph Convolutional Transformer (GCFormer) model which empowers the model to extract local and global emotional information. Specifically, we construct a cyclic graph and perform concise graph convolution operations to obtain spatial local features. Then, a consecutive transformer network further strives to learn more high-level representations and their global temporal correlation. Finally and sequentially, the learned serialized representations from the transformer are mapped into a vector through a gated recurrent unit (GRU) pooling layer for emotion classification. The experiment results obtained on two public emotional datasets demonstrate that the proposed GCFormer performs significantly better than other GCN-based models in terms of prediction accuracy, and surpasses the other state-of-the-art deep learning models in terms of prediction accuracy and model efficiency.
图卷积网络(GCNs)在图像分类和自然语言处理方面取得了优异的成绩。然而,目前GCNs在语音情感识别(SER)中的应用研究并不广泛。与此同时,最近的研究表明,GCNs可能无法自适应地捕获整个音频的远程上下文情感信息。为了解决这一问题,本文提出了一种图形卷积变换(GCFormer)模型,该模型能够提取局部和全局情感信息。具体来说,我们构造了一个循环图,并进行了简洁的图卷积运算来获得空间局部特征。然后,连续变压器网络进一步努力学习更多的高级表示及其全局时间相关性。最后,通过门控循环单元(GRU)池化层将从变压器中学习到的序列化表示映射为向量,用于情感分类。在两个公共情感数据集上的实验结果表明,所提出的GCFormer在预测精度方面明显优于其他基于gcn的模型,在预测精度和模型效率方面优于其他最先进的深度学习模型。
{"title":"GCFormer: A Graph Convolutional Transformer for Speech Emotion Recognition","authors":"Yingxue Gao, Huan Zhao, Yufeng Xiao, Zixing Zhang","doi":"10.1145/3577190.3614177","DOIUrl":"https://doi.org/10.1145/3577190.3614177","url":null,"abstract":"Graph convolutional networks (GCNs) have achieved excellent results in image classification and natural language processing. However, at present, the application of GCNs in speech emotion recognition (SER) is not widely studied. Meanwhile, recent studies have shown that GCNs may not be able to adaptively capture the long-range context emotional information over the whole audio. To alleviate this problem, this paper proposes a Graph Convolutional Transformer (GCFormer) model which empowers the model to extract local and global emotional information. Specifically, we construct a cyclic graph and perform concise graph convolution operations to obtain spatial local features. Then, a consecutive transformer network further strives to learn more high-level representations and their global temporal correlation. Finally and sequentially, the learned serialized representations from the transformer are mapped into a vector through a gated recurrent unit (GRU) pooling layer for emotion classification. The experiment results obtained on two public emotional datasets demonstrate that the proposed GCFormer performs significantly better than other GCN-based models in terms of prediction accuracy, and surpasses the other state-of-the-art deep learning models in terms of prediction accuracy and model efficiency.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Acoustic and Visual Knowledge Distillation for Contrastive Audio-Visual Localization 对比视听定位的声学与视觉知识提炼
Ehsan Yaghoubi, Andre Peter Kelm, Timo Gerkmann, Simone Frintrop
This paper introduces an unsupervised model for audio-visual localization, which aims to identify regions in the visual data that produce sounds. Our key technical contribution is to demonstrate that using distilled prior knowledge of both sounds and objects in an unsupervised learning phase can improve performance significantly. We propose an Audio-Visual Correspondence (AVC) model consisting of an audio and a vision student, which are respectively supervised by an audio teacher (audio recognition model) and a vision teacher (object detection model). Leveraging a contrastive learning approach, the AVC student model extracts features from sounds and images and computes a localization map, discovering the regions of the visual data that correspond to the sound signal. Simultaneously, the teacher models provide feature-based hints from their last layers to supervise the AVC model in the training phase. In the test phase, the teachers are removed. Our extensive experiments show that the proposed model outperforms the state-of-the-art audio-visual localization models on 10k and 144k subsets of the Flickr and VGGS datasets, including cross-dataset validation.
本文介绍了一种用于视听定位的无监督模型,该模型旨在识别视觉数据中产生声音的区域。我们的关键技术贡献是证明在无监督学习阶段使用声音和对象的先验知识可以显著提高性能。我们提出了一个由一个音频学生和一个视觉学生组成的视听对应(AVC)模型,分别由一个音频教师(音频识别模型)和一个视觉教师(目标检测模型)监督。利用对比学习方法,AVC学生模型从声音和图像中提取特征,并计算定位地图,发现与声音信号对应的视觉数据区域。同时,教师模型从其最后一层提供基于特征的提示,以监督AVC模型在训练阶段。在测试阶段,老师被移除。我们的大量实验表明,该模型在Flickr和VGGS数据集的10k和144k子集上优于最先进的视听定位模型,包括跨数据集验证。
{"title":"Acoustic and Visual Knowledge Distillation for Contrastive Audio-Visual Localization","authors":"Ehsan Yaghoubi, Andre Peter Kelm, Timo Gerkmann, Simone Frintrop","doi":"10.1145/3577190.3614144","DOIUrl":"https://doi.org/10.1145/3577190.3614144","url":null,"abstract":"This paper introduces an unsupervised model for audio-visual localization, which aims to identify regions in the visual data that produce sounds. Our key technical contribution is to demonstrate that using distilled prior knowledge of both sounds and objects in an unsupervised learning phase can improve performance significantly. We propose an Audio-Visual Correspondence (AVC) model consisting of an audio and a vision student, which are respectively supervised by an audio teacher (audio recognition model) and a vision teacher (object detection model). Leveraging a contrastive learning approach, the AVC student model extracts features from sounds and images and computes a localization map, discovering the regions of the visual data that correspond to the sound signal. Simultaneously, the teacher models provide feature-based hints from their last layers to supervise the AVC model in the training phase. In the test phase, the teachers are removed. Our extensive experiments show that the proposed model outperforms the state-of-the-art audio-visual localization models on 10k and 144k subsets of the Flickr and VGGS datasets, including cross-dataset validation.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Explainable Depression Detection using Multimodal Behavioural Cues 使用多模式行为线索的可解释抑郁检测
Monika Gahalawat
Depression is a severe mental illness that not only affects the patient but also has major social and economical implications. Recent studies have employed artificial intelligence using multimodal behavioural cues to objectively investigate depression and alleviate the subjectivity involved in current depression diagnostic process. However, head motion has received a fairly limited attention as a behavioural marker for detecting depression and the lack of explainability of the "black box" approaches have restricted their widespread adoption. Consequently, the objective of this research is to examine the utility of fundamental head-motion units termed kinemes and explore the explainability of multimodal behavioural cues for depression detection. To this end, the research to date evaluated depression classification performance on the BlackDog and AVEC2013 datasets using multiple machine learning methods. Our findings indicate that: (a) head motion patterns are effective cues for depression assessment, and (b) explanatory kineme patterns can be observed for the two classes, consistent with prior research.
抑郁症是一种严重的精神疾病,不仅影响患者,而且具有重大的社会和经济影响。最近的研究利用人工智能的多模态行为线索来客观地调查抑郁症,减轻当前抑郁症诊断过程中的主观性。然而,头部运动作为一种检测抑郁症的行为标记受到了相当有限的关注,而且“黑匣子”方法缺乏可解释性,限制了它们的广泛采用。因此,本研究的目的是检验被称为运动学的基本头部运动单元的效用,并探索抑郁症检测的多模态行为线索的可解释性。为此,迄今为止的研究使用多种机器学习方法评估了BlackDog和AVEC2013数据集上的抑郁症分类性能。我们的研究结果表明:(a)头部运动模式是抑郁评估的有效线索;(b)可以观察到两个类别的解释性动力模式,与先前的研究一致。
{"title":"Explainable Depression Detection using Multimodal Behavioural Cues","authors":"Monika Gahalawat","doi":"10.1145/3577190.3614227","DOIUrl":"https://doi.org/10.1145/3577190.3614227","url":null,"abstract":"Depression is a severe mental illness that not only affects the patient but also has major social and economical implications. Recent studies have employed artificial intelligence using multimodal behavioural cues to objectively investigate depression and alleviate the subjectivity involved in current depression diagnostic process. However, head motion has received a fairly limited attention as a behavioural marker for detecting depression and the lack of explainability of the \"black box\" approaches have restricted their widespread adoption. Consequently, the objective of this research is to examine the utility of fundamental head-motion units termed kinemes and explore the explainability of multimodal behavioural cues for depression detection. To this end, the research to date evaluated depression classification performance on the BlackDog and AVEC2013 datasets using multiple machine learning methods. Our findings indicate that: (a) head motion patterns are effective cues for depression assessment, and (b) explanatory kineme patterns can be observed for the two classes, consistent with prior research.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Explainable Depression Detection via Head Motion Patterns 可解释的抑郁检测通过头部运动模式
Monika Gahalawat, Raul Fernandez Rojas, Tanaya Guha, Ramanathan Subramanian, Roland Goecke
While depression has been studied via multimodal non-verbal behavioural cues, head motion behaviour has not received much attention as a biomarker. This study demonstrates the utility of fundamental head-motion units, termed kinemes, for depression detection by adopting two distinct approaches, and employing distinctive features: (a) discovering kinemes from head motion data corresponding to both depressed patients and healthy controls, and (b) learning kineme patterns only from healthy controls, and computing statistics derived from reconstruction errors for both the patient and control classes. Employing machine learning methods, we evaluate depression classification performance on the BlackDog and AVEC2013 datasets. Our findings indicate that: (1) head motion patterns are effective biomarkers for detecting depressive symptoms, and (2) explanatory kineme patterns consistent with prior findings can be observed for the two classes. Overall, we achieve peak F1 scores of 0.79 and 0.82, respectively, over BlackDog and AVEC2013 for binary classification over episodic thin-slices, and a peak F1 of 0.72 over videos for AVEC2013.
虽然抑郁症已经通过多模态非语言行为线索进行了研究,但头部运动行为作为一种生物标志物并未受到太多关注。本研究通过采用两种不同的方法,并采用不同的特征,证明了基本头部运动单元(称为运动学)在抑郁症检测中的作用:(a)从抑郁症患者和健康对照的头部运动数据中发现运动学,(b)仅从健康对照中学习运动模式,并计算从患者和对照类的重建误差中得出的统计数据。采用机器学习方法,我们评估了BlackDog和AVEC2013数据集上的抑郁症分类性能。我们的研究结果表明:(1)头部运动模式是检测抑郁症状的有效生物标志物;(2)可以在这两个类别中观察到与先前发现一致的解释性运动模式。总的来说,我们在BlackDog和AVEC2013上分别获得了0.79和0.82的峰值F1分数,用于情节薄切片的二元分类,AVEC2013的峰值F1分数为0.72。
{"title":"Explainable Depression Detection via Head Motion Patterns","authors":"Monika Gahalawat, Raul Fernandez Rojas, Tanaya Guha, Ramanathan Subramanian, Roland Goecke","doi":"10.1145/3577190.3614130","DOIUrl":"https://doi.org/10.1145/3577190.3614130","url":null,"abstract":"While depression has been studied via multimodal non-verbal behavioural cues, head motion behaviour has not received much attention as a biomarker. This study demonstrates the utility of fundamental head-motion units, termed kinemes, for depression detection by adopting two distinct approaches, and employing distinctive features: (a) discovering kinemes from head motion data corresponding to both depressed patients and healthy controls, and (b) learning kineme patterns only from healthy controls, and computing statistics derived from reconstruction errors for both the patient and control classes. Employing machine learning methods, we evaluate depression classification performance on the BlackDog and AVEC2013 datasets. Our findings indicate that: (1) head motion patterns are effective biomarkers for detecting depressive symptoms, and (2) explanatory kineme patterns consistent with prior findings can be observed for the two classes. Overall, we achieve peak F1 scores of 0.79 and 0.82, respectively, over BlackDog and AVEC2013 for binary classification over episodic thin-slices, and a peak F1 of 0.72 over videos for AVEC2013.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Multimodal, Interactive Interfaces for Education 多模式、交互式教育界面
Daniel C. Tozadore, Lise Aubin, Soizic Gauthier, Barbara Bruno, Salvatore M. Anzalone
In the rapidly evolving landscape of education, the integration of technology and innovative pedagogical approaches has become imperative to engage learners effectively. Our workshop aimed to delve into the intersection of technology, cognitive psychology, and educational theory to explore the potential of multimodal interfaces in transforming the learning experience for both regular and special education. Its interdisciplinary brought together experts from fields of human-computer interaction, education, cognitive science, and computer science. To give further insights to participants discussions, 3 keynotes from experts in the field, 6 presentations of accepted short-papers from participants, and 6 in-loco demos of relevant projects were performed. The high-level content approached tend to tailor works future developed towards this area.
在快速发展的教育环境中,技术和创新教学方法的整合已成为有效吸引学习者的必要条件。我们的研讨会旨在深入研究技术、认知心理学和教育理论的交叉点,探索多模式界面在改变普通教育和特殊教育学习体验方面的潜力。它的跨学科汇集了来自人机交互、教育、认知科学和计算机科学领域的专家。为了给与会者的讨论提供更深入的见解,会议进行了3次领域专家的主题演讲,6次与会者接受的简短论文演讲,以及6次相关项目的现场演示。高水平的内容接近于定制作品未来向这一方向发展。
{"title":"Multimodal, Interactive Interfaces for Education","authors":"Daniel C. Tozadore, Lise Aubin, Soizic Gauthier, Barbara Bruno, Salvatore M. Anzalone","doi":"10.1145/3577190.3616881","DOIUrl":"https://doi.org/10.1145/3577190.3616881","url":null,"abstract":"In the rapidly evolving landscape of education, the integration of technology and innovative pedagogical approaches has become imperative to engage learners effectively. Our workshop aimed to delve into the intersection of technology, cognitive psychology, and educational theory to explore the potential of multimodal interfaces in transforming the learning experience for both regular and special education. Its interdisciplinary brought together experts from fields of human-computer interaction, education, cognitive science, and computer science. To give further insights to participants discussions, 3 keynotes from experts in the field, 6 presentations of accepted short-papers from participants, and 6 in-loco demos of relevant projects were performed. The high-level content approached tend to tailor works future developed towards this area.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Crowd Behaviour Prediction using Visual and Location Data in Super-Crowded Scenarios 超拥挤场景中使用视觉和位置数据的人群行为预测
Antonius Bima Murti Wijaya
Predicting the future trajectory of a crowd is important for safety to prevent disasters such as stampedes or collisions. Extensive research has been conducted to explore trajectory prediction in typical crowd scenarios, where the majority of individuals can be easily identified. However, this study focuses on a more challenging scenario known as the super-crowd scene, wherein individuals within the crowd can only be annotated based on their heads. In this particular scenario, people’s re-identification process in tracking does not perform well due to a lack of clear image data. Our research proposes a clustering strategy to overcome people re-identification problems and predict the cluster crowd trajectory. Two-dimensional(2D) maps and multi-cameras will be used to capture full pictures of crowds in a location and extract the venue’s spatial data (see figure 1). The research methodology encompasses several key steps, including evaluating data extraction of the state-of-the-art methods, estimating crowd clusters, integrating 2D maps and multi-view fusion, and evaluating the proposed method on a dataset of multi-view videos collected in a real-world super-crowded scenario.
预测人群的未来轨迹对于防止踩踏或碰撞等灾难的安全至关重要。在典型的人群场景中,人们已经进行了大量的研究来探索轨迹预测,其中大多数个体很容易被识别。然而,这项研究关注的是一个更具挑战性的场景,即超级人群场景,其中人群中的个体只能根据他们的头部进行注释。在这个特定的场景中,由于缺乏清晰的图像数据,人们在跟踪中的再识别过程表现不佳。我们的研究提出了一种聚类策略来克服人的再识别问题,并预测聚类人群的轨迹。二维(2D)地图和多摄像头将用于捕捉一个地点人群的全图,并提取场地的空间数据(见图1)。研究方法包括几个关键步骤,包括评估最先进方法的数据提取,估计人群集群,整合2D地图和多视图融合,以及在真实世界的超拥挤场景中收集的多视图视频数据集上评估所提出的方法。
{"title":"Crowd Behaviour Prediction using Visual and Location Data in Super-Crowded Scenarios","authors":"Antonius Bima Murti Wijaya","doi":"10.1145/3577190.3614230","DOIUrl":"https://doi.org/10.1145/3577190.3614230","url":null,"abstract":"Predicting the future trajectory of a crowd is important for safety to prevent disasters such as stampedes or collisions. Extensive research has been conducted to explore trajectory prediction in typical crowd scenarios, where the majority of individuals can be easily identified. However, this study focuses on a more challenging scenario known as the super-crowd scene, wherein individuals within the crowd can only be annotated based on their heads. In this particular scenario, people’s re-identification process in tracking does not perform well due to a lack of clear image data. Our research proposes a clustering strategy to overcome people re-identification problems and predict the cluster crowd trajectory. Two-dimensional(2D) maps and multi-cameras will be used to capture full pictures of crowds in a location and extract the venue’s spatial data (see figure 1). The research methodology encompasses several key steps, including evaluating data extraction of the state-of-the-art methods, estimating crowd clusters, integrating 2D maps and multi-view fusion, and evaluating the proposed method on a dataset of multi-view videos collected in a real-world super-crowded scenario.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Companion Publication of the 2020 International Conference on Multimodal Interaction
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1