首页 > 最新文献

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction最新文献

英文 中文
Connections: 2015 ICMI Sustained Accomplishment Award Lecture 联系:2015年ICMI持续成就奖讲座
E. Horvitz
Our community has long pursued principles and methods for enabling fluid and effortless collaborations between people and computing systems. Forging deep connections between people and machines has come into focus over the last 25 years as a grand challenge at the intersection of artificial intelligence, human-computer interaction, and cognitive psychology. I will review experiences and directions with leveraging advances in perception, learning, and reasoning in pursuit of our shared dreams.
长期以来,我们的社区一直在追求实现人与计算系统之间流畅而轻松的协作的原则和方法。在过去的25年里,在人与机器之间建立深层次的联系已经成为人工智能、人机交互和认知心理学交叉领域的一项重大挑战。我将回顾经验和方向,利用感知、学习和推理的进步来追求我们共同的梦想。
{"title":"Connections: 2015 ICMI Sustained Accomplishment Award Lecture","authors":"E. Horvitz","doi":"10.1145/2818346.2835500","DOIUrl":"https://doi.org/10.1145/2818346.2835500","url":null,"abstract":"Our community has long pursued principles and methods for enabling fluid and effortless collaborations between people and computing systems. Forging deep connections between people and machines has come into focus over the last 25 years as a grand challenge at the intersection of artificial intelligence, human-computer interaction, and cognitive psychology. I will review experiences and directions with leveraging advances in perception, learning, and reasoning in pursuit of our shared dreams.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"40 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89114812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Personality Trait Classification via Co-Occurrent Multiparty Multimodal Event Discovery 基于共发生多方多模态事件发现的人格特征分类
S. Okada, O. Aran, D. Gática-Pérez
This paper proposes a novel feature extraction framework from mutli-party multimodal conversation for inference of personality traits and emergent leadership. The proposed framework represents multi modal features as the combination of each participant's nonverbal activity and group activity. This feature representation enables to compare the nonverbal patterns extracted from the participants of different groups in a metric space. It captures how the target member outputs nonverbal behavior observed in a group (e.g. the member speaks while all members move their body), and can be available for any kind of multiparty conversation task. Frequent co-occurrent events are discovered using graph clustering from multimodal sequences. The proposed framework is applied for the ELEA corpus which is an audio visual dataset collected from group meetings. We evaluate the framework for binary classification task of 10 personality traits. Experimental results show that the model trained with co-occurrence features obtained higher accuracy than previously related work in 8 out of 10 traits. In addition, the co-occurrence features improve the accuracy from 2 % up to 17 %.
本文提出了一种新的基于多方多模态对话的特征提取框架,用于人格特质和应急领导的推理。该框架体现了多模态特征,即每个参与者的非语言活动和群体活动的结合。这种特征表示可以在度量空间中比较从不同群体参与者中提取的非语言模式。它捕捉目标成员如何输出在群体中观察到的非语言行为(例如,当所有成员都移动他们的身体时,成员说话),并且可以用于任何类型的多方对话任务。利用图聚类方法从多模态序列中发现频繁的共发生事件。提议的框架应用于ELEA语料库,该语料库是从小组会议收集的视听数据集。我们评估了10种人格特质的二元分类任务框架。实验结果表明,用共现特征训练的模型在10个特征中有8个特征的准确率高于已有的模型。此外,共现特征将准确率从2%提高到17%。
{"title":"Personality Trait Classification via Co-Occurrent Multiparty Multimodal Event Discovery","authors":"S. Okada, O. Aran, D. Gática-Pérez","doi":"10.1145/2818346.2820757","DOIUrl":"https://doi.org/10.1145/2818346.2820757","url":null,"abstract":"This paper proposes a novel feature extraction framework from mutli-party multimodal conversation for inference of personality traits and emergent leadership. The proposed framework represents multi modal features as the combination of each participant's nonverbal activity and group activity. This feature representation enables to compare the nonverbal patterns extracted from the participants of different groups in a metric space. It captures how the target member outputs nonverbal behavior observed in a group (e.g. the member speaks while all members move their body), and can be available for any kind of multiparty conversation task. Frequent co-occurrent events are discovered using graph clustering from multimodal sequences. The proposed framework is applied for the ELEA corpus which is an audio visual dataset collected from group meetings. We evaluate the framework for binary classification task of 10 personality traits. Experimental results show that the model trained with co-occurrence features obtained higher accuracy than previously related work in 8 out of 10 traits. In addition, the co-occurrence features improve the accuracy from 2 % up to 17 %.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87982317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 49
ECA Control using a Single Affective User Dimension 使用单一情感用户维度的ECA控制
Fred Charles, Florian Pecune, Gabor Aranyi, C. Pelachaud, M. Cavazza
User interaction with Embodied Conversational Agents (ECA) should involve a significant affective component to achieve realism in communication. This aspect has been studied through different frameworks describing the relationship between user and ECA, for instance alignment, rapport and empathy. We conducted an experiment to explore how an ECA's non-verbal expression can be controlled to respond to a single affective dimension generated by users as input. Our system is based on the mapping of a high-level affective dimension, approach/avoidance, onto a new ECA control mechanism in which Action Units (AU) are activated through a neural network. Since 'approach' has been associated to prefrontal cortex activation, we use a measure of prefrontal cortex left-asymmetry through fNIRS as a single input signal representing the user's attitude towards the ECA. We carried out the experiment with 10 subjects, who have been instructed to express a positive mental attitude towards the ECA. In return, the ECA facial expression would reflect the perceived attitude under a neurofeedback paradigm. Our results suggest that users are able to successfully interact with the ECA and perceive its response as consistent and realistic, both in terms of ECA responsiveness and in terms of relevance of facial expressions. From a system perspective, the empirical calibration of the network supports a progressive recruitment of various AUs, which provides a principled description of the ECA response and its intensity. Our findings suggest that complex ECA facial expressions can be successfully aligned with one high-level affective dimension. Furthermore, this use of a single dimension as input could support experiments in the fine-tuning of AU activation or their personalization to user preferred modalities.
用户与具身会话代理(ECA)的交互应该包含重要的情感成分,以实现交流的现实性。这方面已经通过描述用户和ECA之间关系的不同框架进行了研究,例如对齐、融洽和同理心。我们进行了一项实验,以探索如何控制ECA的非语言表达,以响应用户作为输入产生的单一情感维度。我们的系统是基于一个高层次的情感维度,接近/回避,映射到一个新的ECA控制机制,其中行动单元(AU)通过一个神经网络被激活。由于“接近”与前额皮质激活有关,我们通过fNIRS测量前额皮质左侧不对称,作为代表用户对ECA态度的单一输入信号。我们对10名受试者进行了实验,他们被要求对ECA表达积极的心理态度。作为回报,ECA面部表情将反映在神经反馈范式下感知到的态度。我们的研究结果表明,用户能够成功地与ECA互动,并将其反应视为一致和现实的,无论是在ECA响应方面还是在面部表情的相关性方面。从系统的角度来看,网络的经验校准支持逐步招募各种AUs,这提供了对ECA响应及其强度的原则性描述。我们的研究结果表明,复杂的ECA面部表情可以成功地与一个高层次的情感维度相一致。此外,使用单一维度作为输入可以支持AU激活的微调实验或用户偏好模式的个性化。
{"title":"ECA Control using a Single Affective User Dimension","authors":"Fred Charles, Florian Pecune, Gabor Aranyi, C. Pelachaud, M. Cavazza","doi":"10.1145/2818346.2820730","DOIUrl":"https://doi.org/10.1145/2818346.2820730","url":null,"abstract":"User interaction with Embodied Conversational Agents (ECA) should involve a significant affective component to achieve realism in communication. This aspect has been studied through different frameworks describing the relationship between user and ECA, for instance alignment, rapport and empathy. We conducted an experiment to explore how an ECA's non-verbal expression can be controlled to respond to a single affective dimension generated by users as input. Our system is based on the mapping of a high-level affective dimension, approach/avoidance, onto a new ECA control mechanism in which Action Units (AU) are activated through a neural network. Since 'approach' has been associated to prefrontal cortex activation, we use a measure of prefrontal cortex left-asymmetry through fNIRS as a single input signal representing the user's attitude towards the ECA. We carried out the experiment with 10 subjects, who have been instructed to express a positive mental attitude towards the ECA. In return, the ECA facial expression would reflect the perceived attitude under a neurofeedback paradigm. Our results suggest that users are able to successfully interact with the ECA and perceive its response as consistent and realistic, both in terms of ECA responsiveness and in terms of relevance of facial expressions. From a system perspective, the empirical calibration of the network supports a progressive recruitment of various AUs, which provides a principled description of the ECA response and its intensity. Our findings suggest that complex ECA facial expressions can be successfully aligned with one high-level affective dimension. Furthermore, this use of a single dimension as input could support experiments in the fine-tuning of AU activation or their personalization to user preferred modalities.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87401564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Image based Static Facial Expression Recognition with Multiple Deep Network Learning 基于图像的静态面部表情识别与多重深度网络学习
Zhiding Yu, Cha Zhang
We report our image based static facial expression recognition method for the Emotion Recognition in the Wild Challenge (EmotiW) 2015. We focus on the sub-challenge of the SFEW 2.0 dataset, where one seeks to automatically classify a set of static images into 7 basic emotions. The proposed method contains a face detection module based on the ensemble of three state-of-the-art face detectors, followed by a classification module with the ensemble of multiple deep convolutional neural networks (CNN). Each CNN model is initialized randomly and pre-trained on a larger dataset provided by the Facial Expression Recognition (FER) Challenge 2013. The pre-trained models are then fine-tuned on the training set of SFEW 2.0. To combine multiple CNN models, we present two schemes for learning the ensemble weights of the network responses: by minimizing the log likelihood loss, and by minimizing the hinge loss. Our proposed method generates state-of-the-art result on the FER dataset. It also achieves 55.96% and 61.29% respectively on the validation and test set of SFEW 2.0, surpassing the challenge baseline of 35.96% and 39.13% with significant gains.
我们报告了基于图像的静态面部表情识别方法,用于野生挑战(EmotiW) 2015中的情绪识别。我们专注于SFEW 2.0数据集的子挑战,其中一个目标是将一组静态图像自动分类为7种基本情绪。该方法包含一个基于三个最先进的人脸检测器集成的人脸检测模块,然后是一个基于多个深度卷积神经网络(CNN)集成的分类模块。每个CNN模型都是随机初始化的,并在2013年面部表情识别挑战赛(FER)提供的更大数据集上进行预训练。然后在SFEW 2.0的训练集上对预训练模型进行微调。为了结合多个CNN模型,我们提出了两种方案来学习网络响应的集成权值:通过最小化对数似然损失和最小化铰链损失。我们提出的方法在FER数据集上生成最先进的结果。在SFEW 2.0的验证集和测试集上分别达到55.96%和61.29%,超过了挑战基线的35.96%和39.13%,取得了显著的进步。
{"title":"Image based Static Facial Expression Recognition with Multiple Deep Network Learning","authors":"Zhiding Yu, Cha Zhang","doi":"10.1145/2818346.2830595","DOIUrl":"https://doi.org/10.1145/2818346.2830595","url":null,"abstract":"We report our image based static facial expression recognition method for the Emotion Recognition in the Wild Challenge (EmotiW) 2015. We focus on the sub-challenge of the SFEW 2.0 dataset, where one seeks to automatically classify a set of static images into 7 basic emotions. The proposed method contains a face detection module based on the ensemble of three state-of-the-art face detectors, followed by a classification module with the ensemble of multiple deep convolutional neural networks (CNN). Each CNN model is initialized randomly and pre-trained on a larger dataset provided by the Facial Expression Recognition (FER) Challenge 2013. The pre-trained models are then fine-tuned on the training set of SFEW 2.0. To combine multiple CNN models, we present two schemes for learning the ensemble weights of the network responses: by minimizing the log likelihood loss, and by minimizing the hinge loss. Our proposed method generates state-of-the-art result on the FER dataset. It also achieves 55.96% and 61.29% respectively on the validation and test set of SFEW 2.0, surpassing the challenge baseline of 35.96% and 39.13% with significant gains.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"59 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82912237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 537
Multimodal Interaction with a Bifocal View on Mobile Devices 在移动设备上使用双焦点视图的多模式交互
S. Pelurson, L. Nigay
On a mobile device, the intuitive Focus+Context layout of a detailed view (focus) and perspective/distorted panels on either side (context) is particularly suitable for maximizing the utilization of the limited available display area. Interacting with such a bifocal view requires both fast access to data in the context view and high precision interaction with data in the detailed focus view. We introduce combined modalities that solve this problem by combining the well-known flick-drag gesture-based precise modality with modalities for fast access to data in the context view. The modalities for fast access to data in the context view include direct touch in the context view as well as navigation based on drag gestures, on tilting the device, on side-pressure inputs or by spatially moving the device (dynamic peephole). Results of a comparison experiment of the combined modalities show that the performance can be analyzed according to a 3-phase model of the task: a focus-targeting phase, a transition phase (modality switch) and a cursor-pointing phase. Moreover modalities of the focus-targeting phase based on a discrete mode of navigation control (direct access, pressure sensors as discrete navigation controller) require a long transition phase: this is mainly due to disorientation induced by the loss of control in movements. This effect is significantly more pronounced than the articulatory time for changing the position of the fingers between the two modalities ("homing" time).
在移动设备上,直观的焦点+上下文布局的详细视图(焦点)和两侧的透视/扭曲面板(上下文)特别适合最大化利用有限的可用显示区域。与这样的双焦点视图交互既需要快速访问上下文视图中的数据,又需要与详细焦点视图中的数据进行高精度交互。我们引入了组合模式,通过将众所周知的基于弹移手势的精确模式与在上下文视图中快速访问数据的模式相结合,解决了这个问题。在上下文视图中快速访问数据的方式包括在上下文视图中直接触摸,以及基于拖动手势、倾斜设备、侧压输入或通过空间移动设备(动态窥视孔)进行导航。组合模态的对比实验结果表明,可以根据任务的三个阶段模型进行性能分析:焦点瞄准阶段、过渡阶段(模态切换)和指针指向阶段。此外,基于离散导航控制模式(直接访问,压力传感器作为离散导航控制器)的焦点瞄准阶段模式需要很长的过渡阶段:这主要是由于运动失去控制引起的定向障碍。这种效果明显比在两种模式之间改变手指位置的发音时间(“归位”时间)更为明显。
{"title":"Multimodal Interaction with a Bifocal View on Mobile Devices","authors":"S. Pelurson, L. Nigay","doi":"10.1145/2818346.2820731","DOIUrl":"https://doi.org/10.1145/2818346.2820731","url":null,"abstract":"On a mobile device, the intuitive Focus+Context layout of a detailed view (focus) and perspective/distorted panels on either side (context) is particularly suitable for maximizing the utilization of the limited available display area. Interacting with such a bifocal view requires both fast access to data in the context view and high precision interaction with data in the detailed focus view. We introduce combined modalities that solve this problem by combining the well-known flick-drag gesture-based precise modality with modalities for fast access to data in the context view. The modalities for fast access to data in the context view include direct touch in the context view as well as navigation based on drag gestures, on tilting the device, on side-pressure inputs or by spatially moving the device (dynamic peephole). Results of a comparison experiment of the combined modalities show that the performance can be analyzed according to a 3-phase model of the task: a focus-targeting phase, a transition phase (modality switch) and a cursor-pointing phase. Moreover modalities of the focus-targeting phase based on a discrete mode of navigation control (direct access, pressure sensors as discrete navigation controller) require a long transition phase: this is mainly due to disorientation induced by the loss of control in movements. This effect is significantly more pronounced than the articulatory time for changing the position of the fingers between the two modalities (\"homing\" time).","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"55 4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83334731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Deep Learning for Emotion Recognition on Small Datasets using Transfer Learning 基于迁移学习的小数据集情感识别深度学习
Hongwei Ng, Viet Dung Nguyen, Vassilios Vonikakis, Stefan Winkler
This paper presents the techniques employed in our team's submissions to the 2015 Emotion Recognition in the Wild contest, for the sub-challenge of Static Facial Expression Recognition in the Wild. The objective of this sub-challenge is to classify the emotions expressed by the primary human subject in static images extracted from movies. We follow a transfer learning approach for deep Convolutional Neural Network (CNN) architectures. Starting from a network pre-trained on the generic ImageNet dataset, we perform supervised fine-tuning on the network in a two-stage process, first on datasets relevant to facial expressions, followed by the contest's dataset. Experimental results show that this cascading fine-tuning approach achieves better results, compared to a single stage fine-tuning with the combined datasets. Our best submission exhibited an overall accuracy of 48.5% in the validation set and 55.6% in the test set, which compares favorably to the respective 35.96% and 39.13% of the challenge baseline.
本文介绍了我们团队提交给2015年野外情感识别竞赛的技术,用于野外静态面部表情识别的子挑战。这个子挑战的目标是对从电影中提取的静态图像中主要人类主体所表达的情感进行分类。我们遵循深度卷积神经网络(CNN)架构的迁移学习方法。从在通用ImageNet数据集上预训练的网络开始,我们分两个阶段对网络进行监督微调,首先是与面部表情相关的数据集,然后是比赛的数据集。实验结果表明,与组合数据集的单阶段微调相比,这种级联微调方法取得了更好的效果。我们的最佳提交在验证集中显示出48.5%的总体准确性,在测试集中显示出55.6%的总体准确性,这与挑战基线的35.96%和39.13%相比是有利的。
{"title":"Deep Learning for Emotion Recognition on Small Datasets using Transfer Learning","authors":"Hongwei Ng, Viet Dung Nguyen, Vassilios Vonikakis, Stefan Winkler","doi":"10.1145/2818346.2830593","DOIUrl":"https://doi.org/10.1145/2818346.2830593","url":null,"abstract":"This paper presents the techniques employed in our team's submissions to the 2015 Emotion Recognition in the Wild contest, for the sub-challenge of Static Facial Expression Recognition in the Wild. The objective of this sub-challenge is to classify the emotions expressed by the primary human subject in static images extracted from movies. We follow a transfer learning approach for deep Convolutional Neural Network (CNN) architectures. Starting from a network pre-trained on the generic ImageNet dataset, we perform supervised fine-tuning on the network in a two-stage process, first on datasets relevant to facial expressions, followed by the contest's dataset. Experimental results show that this cascading fine-tuning approach achieves better results, compared to a single stage fine-tuning with the combined datasets. Our best submission exhibited an overall accuracy of 48.5% in the validation set and 55.6% in the test set, which compares favorably to the respective 35.96% and 39.13% of the challenge baseline.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88925689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 568
Interactive Web-based Image Sonification for the Blind 盲人交互式网络图像超声
T. Wörtwein, Boris Schauerte, Karin Müller, R. Stiefelhagen
In this demonstration, we show a web-based sonification platform that allows blind users to interactively experience various information using two nowadays widespread technologies: modern web browsers that implement high-level JavaScript APIs and touch-sensitive displays. This way, blind users can easily access information such as, for example, maps or graphs. Our current prototype provides various sonifications that can be switched depending on the image type and user preference. The prototype runs in Chrome and Firefox on PCs, smart phones, and tablets.
在这个演示中,我们展示了一个基于web的声音平台,它允许盲人用户使用两种当今广泛使用的技术交互体验各种信息:实现高级JavaScript api的现代web浏览器和触摸感应显示器。这样,盲人用户可以很容易地访问信息,例如地图或图表。我们目前的原型提供了各种声音,可以根据图像类型和用户偏好进行切换。这个原型可以在Chrome和Firefox上运行,可以在个人电脑、智能手机和平板电脑上运行。
{"title":"Interactive Web-based Image Sonification for the Blind","authors":"T. Wörtwein, Boris Schauerte, Karin Müller, R. Stiefelhagen","doi":"10.1145/2818346.2823298","DOIUrl":"https://doi.org/10.1145/2818346.2823298","url":null,"abstract":"In this demonstration, we show a web-based sonification platform that allows blind users to interactively experience various information using two nowadays widespread technologies: modern web browsers that implement high-level JavaScript APIs and touch-sensitive displays. This way, blind users can easily access information such as, for example, maps or graphs. Our current prototype provides various sonifications that can be switched depending on the image type and user preference. The prototype runs in Chrome and Firefox on PCs, smart phones, and tablets.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82066242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Adjacent Vehicle Collision Warning System using Image Sensor and Inertial Measurement Unit 基于图像传感器和惯性测量单元的相邻车辆碰撞预警系统
Asif Iqbal, C. Busso, N. Gans
Advanced driver assistance systems are the newest addition to vehicular technology. Such systems use a wide array of sensors to provide a superior driving experience. Vehicle safety and driver alert are important parts of these system. This paper proposes a driver alert system to prevent and mitigate adjacent vehicle collisions by proving warning information of on-road vehicles and possible collisions. A dynamic Bayesian network (DBN) is utilized to fuse multiple sensors to provide driver awareness. It detects oncoming adjacent vehicles and gathers ego vehicle motion characteristics using an on-board camera and inertial measurement unit (IMU). A histogram of oriented gradient feature based classifier is used to detect any adjacent vehicles. Vehicles front-rear end and side faces were considered in training the classifier. Ego vehicles heading, speed and acceleration are captured from the IMU and feed into the DBN. The network parameters were learned from data via expectation maximization(EM) algorithm. The DBN is designed to provide two type of warning to the driver, a cautionary warning and a brake alert for possible collision with other vehicles. Experiments were completed on multiple public databases, demonstrating successful warnings and brake alerts in most situations.
先进的驾驶辅助系统是最新的车辆技术。这样的系统使用广泛的传感器阵列来提供优越的驾驶体验。车辆安全和驾驶员警报是该系统的重要组成部分。本文提出了一种驾驶员预警系统,通过验证道路上车辆的预警信息和可能发生的碰撞,来预防和减轻相邻车辆的碰撞。利用动态贝叶斯网络(DBN)融合多个传感器,提供驾驶员感知。该系统利用车载摄像头和惯性测量单元(IMU)来检测迎面而来的相邻车辆,并收集车辆的运动特征。采用直方图梯度特征分类器检测相邻车辆。在训练分类器时考虑了车辆的前后端和侧面。Ego车辆的航向,速度和加速度从IMU捕获并馈送到DBN。通过期望最大化(EM)算法从数据中学习网络参数。DBN旨在向驾驶员提供两种类型的警告,一种是警示性警告,另一种是可能与其他车辆发生碰撞的制动警报。在多个公共数据库上完成了实验,在大多数情况下演示了成功的警告和制动警报。
{"title":"Adjacent Vehicle Collision Warning System using Image Sensor and Inertial Measurement Unit","authors":"Asif Iqbal, C. Busso, N. Gans","doi":"10.1145/2818346.2820741","DOIUrl":"https://doi.org/10.1145/2818346.2820741","url":null,"abstract":"Advanced driver assistance systems are the newest addition to vehicular technology. Such systems use a wide array of sensors to provide a superior driving experience. Vehicle safety and driver alert are important parts of these system. This paper proposes a driver alert system to prevent and mitigate adjacent vehicle collisions by proving warning information of on-road vehicles and possible collisions. A dynamic Bayesian network (DBN) is utilized to fuse multiple sensors to provide driver awareness. It detects oncoming adjacent vehicles and gathers ego vehicle motion characteristics using an on-board camera and inertial measurement unit (IMU). A histogram of oriented gradient feature based classifier is used to detect any adjacent vehicles. Vehicles front-rear end and side faces were considered in training the classifier. Ego vehicles heading, speed and acceleration are captured from the IMU and feed into the DBN. The network parameters were learned from data via expectation maximization(EM) algorithm. The DBN is designed to provide two type of warning to the driver, a cautionary warning and a brake alert for possible collision with other vehicles. Experiments were completed on multiple public databases, demonstrating successful warnings and brake alerts in most situations.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91189600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Exploring Behavior Representation for Learning Analytics 探索学习分析的行为表示
M. Worsley, Stefan Scherer, Louis-Philippe Morency, Paulo Blikstein
Multimodal analysis has long been an integral part of studying learning. Historically multimodal analyses of learning have been extremely laborious and time intensive. However, researchers have recently been exploring ways to use multimodal computational analysis in the service of studying how people learn in complex learning environments. In an effort to advance this research agenda, we present a comparative analysis of four different data segmentation techniques. In particular, we propose affect- and pose-based data segmentation, as alternatives to human-based segmentation, and fixed-window segmentation. In a study of ten dyads working on an open-ended engineering design task, we find that affect- and pose-based segmentation are more effective, than traditional approaches, for drawing correlations between learning-relevant constructs, and multimodal behaviors. We also find that pose-based segmentation outperforms the two more traditional segmentation strategies for predicting student success on the hands-on task. In this paper we discuss the algorithms used, our results, and the implications that this work may have in non-education-related contexts.
多模态分析一直是研究学习的重要组成部分。从历史上看,学习的多模态分析是非常费力和耗时的。然而,研究人员最近一直在探索如何使用多模态计算分析来研究人们如何在复杂的学习环境中学习。为了推进这一研究议程,我们对四种不同的数据分割技术进行了比较分析。特别是,我们提出了基于情感和姿态的数据分割,作为基于人的分割和固定窗口分割的替代方案。在一项针对开放式工程设计任务的十个二人组的研究中,我们发现基于情感和姿势的分割在绘制学习相关构念和多模态行为之间的相关性方面比传统方法更有效。我们还发现,基于姿势的分割在预测学生在实践任务中的成功方面优于两种更传统的分割策略。在本文中,我们讨论了所使用的算法,我们的结果,以及这项工作在非教育相关背景下可能产生的影响。
{"title":"Exploring Behavior Representation for Learning Analytics","authors":"M. Worsley, Stefan Scherer, Louis-Philippe Morency, Paulo Blikstein","doi":"10.1145/2818346.2820737","DOIUrl":"https://doi.org/10.1145/2818346.2820737","url":null,"abstract":"Multimodal analysis has long been an integral part of studying learning. Historically multimodal analyses of learning have been extremely laborious and time intensive. However, researchers have recently been exploring ways to use multimodal computational analysis in the service of studying how people learn in complex learning environments. In an effort to advance this research agenda, we present a comparative analysis of four different data segmentation techniques. In particular, we propose affect- and pose-based data segmentation, as alternatives to human-based segmentation, and fixed-window segmentation. In a study of ten dyads working on an open-ended engineering design task, we find that affect- and pose-based segmentation are more effective, than traditional approaches, for drawing correlations between learning-relevant constructs, and multimodal behaviors. We also find that pose-based segmentation outperforms the two more traditional segmentation strategies for predicting student success on the hands-on task. In this paper we discuss the algorithms used, our results, and the implications that this work may have in non-education-related contexts.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88539425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Presentation Trainer, your Public Speaking Multimodal Coach 演讲培训师,你的公共演讲多模式教练
J. Schneider, D. Börner, P. V. Rosmalen, M. Specht
The Presentation Trainer is a multimodal tool designed to support the practice of public speaking skills, by giving the user real-time feedback about different aspects of her nonverbal communication. It tracks the user's voice and body to interpret her current performance. Based on this performance the Presentation Trainer selects the type of intervention that will be presented as feedback to the user. This feedback mechanism has been designed taking in consideration the results from previous studies that show how difficult it is for learners to perceive and correctly interpret real-time feedback while practicing their speeches. In this paper we present the user experience evaluation of participants who used the Presentation Trainer to practice for an elevator pitch, showing that the feedback provided by the Presentation Trainer has a significant influence on learning.
Presentation Trainer是一个多模式的工具,通过给使用者非语言交流的不同方面的实时反馈,来支持公众演讲技巧的练习。它会跟踪用户的声音和身体来解读她当前的表现。基于此表现,演示培训师选择将作为反馈呈现给用户的干预类型。这种反馈机制的设计考虑了之前的研究结果,这些研究表明学习者在练习演讲时感知和正确解读实时反馈是多么困难。在本文中,我们展示了使用Presentation Trainer进行电梯游说练习的参与者的用户体验评估,表明Presentation Trainer提供的反馈对学习有显著的影响。
{"title":"Presentation Trainer, your Public Speaking Multimodal Coach","authors":"J. Schneider, D. Börner, P. V. Rosmalen, M. Specht","doi":"10.1145/2818346.2830603","DOIUrl":"https://doi.org/10.1145/2818346.2830603","url":null,"abstract":"The Presentation Trainer is a multimodal tool designed to support the practice of public speaking skills, by giving the user real-time feedback about different aspects of her nonverbal communication. It tracks the user's voice and body to interpret her current performance. Based on this performance the Presentation Trainer selects the type of intervention that will be presented as feedback to the user. This feedback mechanism has been designed taking in consideration the results from previous studies that show how difficult it is for learners to perceive and correctly interpret real-time feedback while practicing their speeches. In this paper we present the user experience evaluation of participants who used the Presentation Trainer to practice for an elevator pitch, showing that the feedback provided by the Presentation Trainer has a significant influence on learning.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82342851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 73
期刊
Proceedings of the 2015 ACM on International Conference on Multimodal Interaction
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1